Running Spark-Shell on Windows - windows

I have downloaded spark, sbt, scala, and git onto my Windows computer. When I try and run spark-shell in my command prompt, I get "Failed to find Spark assembly JAR. You need to build Spark with sbt\sbt assembly before running this program."
I tried to follow this guide: https://x86x64.wordpress.com/2015/04/29/installing-spark-on-windows/ ,but I don't have a build subfolder so I am not sure if that is the problem.
Any help would be appreciated.

That's an old guide for spark 1.3.
Please use this guide to set up spark on Windows.
http://www.ics.uci.edu/~shantas/Install_Spark_on_Windows10.pdf
This guide uses Maven and you are going to use sbt but nevertheless you will be able to execute spark-shell with this guide.

Related

How do I clear the cache in Pentaho Data Integration

I am trying to run Pentaho Data Integration (ver. 8.3) in my Windows machine and it is not working.
These are the steps I tried to make it work:
Tried rebooting the machine without success.
Also tried to run the Spoon.bat command directly from the directory where Pentaho is located, but it did not work.
Checked if my java installation changed since the last time it worked, it did not, what can be happening?
In a support chat I read someone was able to fix the problem by clearing the cache, but did not explain how to do it, how do I clean the cache?
Have you installed >jdk java 1.8 environment?
And you should open powershell or other terminal to check: java -version
These are the steps to clear the cache in a Windows environment:
Go to C:\Users\youruser.kettle
Look for the file db.cache-* (I have PDI version 8.3, my file is named db.cache-8.3.0.0-371)
Edit the file with any editor (i.e. Notepad) and erase all content
It worked for me!
First, you should run .\data-integration\set-pentaho-env.bat; It will set pdi environment. The most important is the Java HOME, and JAVA Version. PDI 8.3 can only run in JAVA 1.8 and above.

Getting Started with hadoop on windows

I am a newbie here. I just love to code and develop own programs...Day before yesterday I got an idea to set Hadoop on windows. I just fetched all the stacks but could not successfully install it. I am attaching screenshots together with my query. My windows version is 8.1 64 bit.
The snapshot that you provided says that the JAVA_HOME is not set correctly, can you make sure that JAVA_HOME is set properly in your system?
Please verify javac and java commands works from your command prompt.
OR can you please provide hadoop-env.cmd content? so that we can find out the root cause?

How to run Nutch in Hadoop installed in pseudo-distributed mode

I have Nutch 1.13 installed on my Ubuntu. I can run a crawl in standalone mode. It successfully runs and produces the desired results but I have no idea how to run it in hadoop now? I have Hadoop installed in pseudo distributed mode and I want to run a Nutch crawl with Hadoop and monitor it. How can I do it? There are a lot of tutorials for running it in standalone mode but I couldn't find any clear instructions on how Can I run it in Hadoop except that I have to use "Nutch Job" after I build it with ant.
Thanks for your help.
Make sure you have built Nutch from source i.e. don't use the binary release which works only in local mode. Once you've compile with
ant clean runtime
go to runtime/deploy/bin and run the scripts as usual.
NB you need to modify the conf files prior to recompiling.

build spark on windows from source code

It would be nice if I could download the source code of spark from github, then build it with sbt on my windows machine, and use IntelliJ to make little modifications to the code base. I have installed spark before on windows quite a few times, but I just use the packaged tarball and not the source code. Has anyone built the source code on a windows machine before?
You need to account also for the simple differences in \n\r and \n. So you should use dos2Unix utility for Linux and make sure that you are using an up to date version of Cygwin when installing and running hadoop utils.
I found the spark developer tools page and it was very helpful. I needed "build/sbt compile"
http://spark.apache.org/developer-tools.html#reducing-build-times

Running Apache Spark on Windows 7

I am trying to run Apache Spark on Windows 7. At first I have installed SBT by msi, then extracted files from spark-1.0.0 to program files by 7-zip. In the command line, I wrote the following:
spark-directory: sbt/sbt assembly
After a few seconds of processing, I got errors like:
-server access error: connection timed out
-could not retrieve jansi 1.1
-error during sbt execution: error retrieving required libraries
-unresolved dependency, jansi 1.1 not found
Could you please give me some advices about running Spark on Windows? I am looking for the right way because I am completely new with this technology. Regards.
You could use the pre-built spark from here
The scripts inside bin folder works in windows 7.
You need to set HADOOP_HOME variable in your path.
spark on windows for more information
If you are using building with sbt approach, then you'll need git also.
Install Scala, sbt and git on your machine. Download Spark source code and run following command
sbt assembly
In case,if you use prebuilt release,Here is the step by step process :
How to run Apache Spark on Windows7 in standalone mode

Resources