I'm trying to configure map-reduce in eclipse indigo with hadoop version 2.5. I downloaded hadoop 2.5 source and added all the libraries in the eclipse project.
While trying to run the project, it is showing following error
Java path and classpath was set properly. Please help me.!!
Configuring cygiwn SSH is mandatory to use eclipse map-reduce?
I am not sure what you are trying to do here. If you are running the application in eclipse as a regular traditional java program the following may help.
Hadoop map reduce programs must run the program using the hadoop jar command usually after using SSH ( PuTTY ) onto the cluster and using TFTP ( FileZila ) to port the .jar file to the cluster.
Usage: hadoop jar <jar> [mainClass] argsā¦
If you want to debug the application use java.util.logging.Logger.
Related
I've been spending some time banging my head over trying to run a complex spark application locally in order to test quicker (without having to package and deploy to a cluster).
Some context:
This spark application interfaces with Datastax Enterprise version of Cassandra and their distributed file system, so it needs some explicit jars to be provided (not available in Maven)
These jars are available on my local machine, and to "cheese" this, I tried placing them in SPARK_HOME/jars so they would be automatically added to the classpath
I tried to do something similar with the required configuration settings by putting them in spark-defaults.conf under SPARK_HOME/conf
When building this application, we do not build an uber jar, but rather do a spark-submit on the server using --jars
The problem I'm facing, is when I run the Spark Application through my IDE, it seems like it doesn't pick up any of these additional items from the SPARK_HOME director (config or jars). I spent a few hours trying to get the config items to work and ended up setting them as System.property values in my test case before starting the spark session in order for Spark to pick them up, so the configuration settings can be ignored.
However, I do not know how to reproduce this for the vendor specific jar files. Is there an easy way I can emulate the --jars behavior that spark-submit does and some home set up my spark session with this jar value? Note: I am using in my code the following command to start a spark session:
SparkSession.builder().config(conf).getOrCreate()
Additional information, in case it helps:
The Spark version I have locally in SPARK_HOME is the same version that my code is compiling with using Maven.
I asked another question similar to this related to configs: Loading Spark Config for testing Spark Applications
When I print the SPARK_HOME environment variable in my application, I am getting the correct SPARK_HOME value, so I'm not sure why neither the configs or jar files are being picked up from here. Is it possible that when running the application from my IDE, it's not picking up the SPARK_HOME environment variable and using all defaults?
You can make use of .config(key, value) while building the SparkSession by passing "spark.jars" as the key and a comma separated list of paths to the jar like so:
SparkSession.builder().config("spark.jars", "/path/jar1.jar, /path/jar2.jar").config(conf).getOrCreate()
When specifying a jar at "spark.jars", and running on a standalone spark, without spark-submit. Where is the jar loaded from?
I have a Spring application that performs some spark operations on a Spark standalone running in Docker.
My application relies on various libraries such as MySQL JDBC, ElasticSearch, etc, and thus it fails running on the cluster which doesn't have them.
I assembled my jar with all its dependencies and moved it to the /jars directory in Docker. But still no luck.
13:28:42.577 [Executor task launch worker-0] INFO org.apache.spark.executor.Executor - Fetching spark://192.168.99.1:58290/jars/xdf-1.0.jar with timestamp 1499088505128
13:28:42.614 [dispatcher-event-loop-0] INFO org.apache.spark.executor.Executor - Executor is trying to kill task 0.3 in stage 1.0 (TID 7)
13:28:42.698 [Executor task launch worker-0] DEBUG org.apache.spark.network.client.TransportClient - Sending stream request for /jars/xdf-1.0.jar to /192.168.99.1:58290
13:28:42.741 [shuffle-client-7-1] DEBUG org.apache.spark.rpc.netty.NettyRpcEnv - Error downloading stream /jars/xdf-1.0.jar.
java.lang.RuntimeException: Stream '/jars/xdf-1.0.jar' was not found.
Now I noticed that it's looking for the jar on the driver host but I don't understand where it's trying to deploy it from.
Any one has an
idea where it's looking for that jar.
I figured it out. The jars are loaded from the driver node.
So, I didn't need to move my jar to the spark nodes. And I had to set the correct path to the dependency jar.
So this solved it:
spark.jars=./target/scala-2.1.1/xdf.jar
If you're essentially running a standalone application running in local mode, you will need to provide all jars on your own, as opposed to having spark-submit stage the spark run time for you. Assuming that you're using a build system such as maven or gradle, you will need to package all transitive dependencies with your application and remove any scope provided declarations.
The easiest in this case is to use assembly or maven-shade plugin to package a fat jar and then run that.
if you're running on cluster mode, you can programmatically submit your application using SparkLauncher, here's an example in scala:
import org.apache.spark.launcher.SparkLauncher
object Launcher extends App {
val spark = new SparkLauncher()
.setSparkHome("/home/user/spark-1.4.0-bin-hadoop2.6")
.setAppResource("/home/user/example-assembly-1.0.jar")
.setMainClass("MySparkApp")
.setMaster("local[*]")
.launch();
spark.waitFor();
}
Keep in mind that in Yarn mode, you will also have to provide the path to your hadoop configuration.
I have installed Hadoop 1.2.1 on a three node cluster. while installing Oozie, When i try to generate a war file for the web console, I get this error.
hadoop-mapreduce-client-core-[0-9.]*.jar' not found in '/home/hduser/hadoop'
I believe the version of Hadoop that I am using doesn't have this jar file(don't know where to find them). So can anyone please tell me how to create a war file and enable the web console. Any help is appreciated.
You are correct. You have 2 options :
1. Download individual jars and put them inside your hadoop1.2.1 directory and generate the war file.
2. Download Hadoop 2.x and use it while creating the war file and once it has been built continue using your hadoop1.2.1.
For example : oozie-3.3.2 bin/oozie-setup.sh prepare-war -hadoop
hadoop-1.1.2 ~/hadoop-eco/hadoop-2.2.0 -extjs
~/hadoop-eco/oozie/oozie-3.3.2/webapp/src/main/webapp/ext-2.2
Here I have built Oozie-3.3.2 to use it with hadoop-1.1.2 but using hadoop-2.2.0
HTH
I want to use the Hadoop Eclipse plugin to run the WordCount example.
I have the systems: Local: Windows 7, Eclipse Juno (4.2.2), hadoop-1.2.1 unpacked. Remote: Debian 7.1 with the same hadoop version installed and tested.
I followed the instructions found at: http://iredlof.com/part-4-compile-hadoop-v1-0-4-
eclipse-plugin-on-ubuntu-12-10/ and built the plugin on windows machine.
The hadoop is running, tested with hadoop-examples wordcount and with my freshly created WordCount.
What works with the plugin:
I can create a new MR project
I can add new MR location (remote in my case)
I can browse/upload/download/delete files from DFS,
What doesn't work:
I cannot run my code (using Run as ... Run to Hadoop). The console writes "ClassNotFoundException: WordCountReducer". The same error can be found in the hadoop job logs.
I exported the jar from my project, copied it on the remote machine and launched hadoop from command line. Everything worked as expected.
I also saw that when manually launching the project on the remote machine, hadoop creates a job.jar in user/.staging directory. When launching the project from Eclipse, this jar is missing.
My question is: How can I run my project from Eclipse plugin?
Thanks
Set the user from your job driver.
System.setProperty("HADOOP_USER_NAME", "YourUbuntuUserID");
It might work. Try and let me know.
I am trying to build some program on hadoop with ubuntu. I am able to successfully install and run hadoop on my machine in pseudo-distributed mode. But when I tried to use eclipse-plugin for making project,I am facing several issue. After putting parameters for connecting to the server in the eclipse plugin I am getting the following error:
1.Error: java.io.IOException:Unknown Protocol to jobTracker:org.apache.hadoop.hdfs.Protocol.ClientProtocol
I am using hadoop 0.20 version and eclipse plugin is also from the configuration directory. Any suggestion or reason why these errors are coming.And what can I do for build hadoop project on eclipse?
Go to "Edit hadoop location".
Switch Map/Reduce Master port with DFS master port.