Getting started with Hadoop and Eclipse - hadoop

I'm following a couple of tutorials for setting up Hadoop with Eclipse.
This one is from Cloudera : http://v-lad.org/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
But this seems to focus on checking out the latest code from Hadoop and tweaking it.
This is rare although, usually the latest release of Hadoop will suffice most users needs?
Whereas this tutorial seems to focus on setting up and running hadoop :
http://v-lad.org/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
I just want to run some basic map reduce jobs to get started. I don't think I should be using the latest code from Hadoop as cloudera specifies in above first link to get started ?

Here is a blog entry and screencast on developing/debugging applications in Eclipse. The procedure works with most versions of Hadoop.

You may try this tutorial on installing Hadoop plugin for eclipse: http://bigsonata.com/?p=168

Related

How do we install Apache BigTop with Ambari?

I am trying to find out how to deploy a hadoop cluster using ambari by using apache big top
According to the latest release bigtop 1.5:
https://blogs.apache.org/bigtop/
my understanding is that Bigtop Mpack was added as a new feature, that enables users to
deploy Bigtop components via Apache Ambari.
I am able to install the Bigtop components via command line, but do not find any documentation on how to install these bigtop hadoop components via ambari.
Can someone please help redirect me into some documentation that tells me how to install various hadoop components(bigtop packages) via ambari?
Thanks,
I'm from Bigtop community. Though I don't have a comprehensive answer. The Bigtop user mailing list had a discussion recently that has several tech details can answer your question:
https://lists.apache.org/thread.html/r8c5d8dfdee9b7d72164504ff2f2ea641ce39aa02364d60917eaa9fa5%40%3Cuser.bigtop.apache.org%3E
OTOH, you are always welcome to join the mailing list and ask questions. Our community is active and happy to answer questions.
Build a repo of Big Top
To install that repo with Ambari, you have to register the stack/version. You will need to create a version file. I found an example of one here.
Complete installation like you would with a normal build
This is highly theoretical (..haven't done this before..) I have worked with a BIGTOP Mpack before that took care of some of this work but it's not production ready yet, and works with an old version of Ambari, not the newest. (I was able to install/stop/start HDFS/Hive). These instruction above should work with any version of Ambari.
I have been able to test Matt Andruff's theory with a VM. Here was my process and where I stopped;
Built a repo of Apache BigTop 1.5.0
Built BigTop using Gradlew
Installed Apache Ambari 2.6.1 on my system
Enabled BigInsights build version xml file and modified the package version numbers to match my Bigtop build
Note: You can also build your own version file if you want as Matt mentioned
Setup a webserver to host your package repo
Point your xml version file repo to your local webserver for packages
From there you can complete the installation of your packages as you would normally.
I have only done this with a single VM thus far and will be trying to spin up a small cluster using AWS in the coming weeks.

Can I use Spark without Hadoop for development environment?

I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo.
I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions.
What do I need:
Run all features from Spark without problems, but in a single computer (my home computer).
Everything that I made in my computer with Spark should run in a future cluster without problems.
There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes?
Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.
Yes you can install Spark without Hadoop.
Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html
Rough steps :
Download precomplied spark or download spark source and build locally
extract TAR
Set required environment variable
Run start script .
Spark(without Hadoop) - Available on Spark Download page
URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
If this url do not work then try to get it from Spark download page
This is not a proper answer to original question.
Sorry, It is my fault.
If someone want to run spark without hadoop distribution tar.gz.
there should be environment variable to set. this spark-env.sh worked for me.
#!/bin/sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

Hadoop 2.7.1 Eclipse plugin creation

After reading almost all the previous posts and web links, I feel to write this post as a need. I am unable to find any directory named ${YOUR_HADOOP_HOME}/src/contrib/eclipse-plugin in my windows based build of Hadoop 2.7.1.
I have downloaded already compiled build from a source but as a matter of learning i want to build it myself. Is there any other way to have source files for creating Hadoop 2.7.1 eclipse plugin? or did i miss something at the time of building my own windows based hadoop? Please explain and if possible provide source for windows 7 build environment.
Thanks

Vector.class doesn't exist in Mahout-core 0.7 -cdh4.2.1 jar, not able to run SimpleKMeansClustering Example

Hi i'm new to Mahout so was to trying to run the SimpleKMeansClustering from Github.
I dowloaded the Mahout-core jar from Cloudera Repository.
Now when i'm trying to compile my code in eclipse, i cannot find the Vector.Class in org.apche.mahout.math
Please can you let me know where i can find the Vector.class or how to run a clustering example on my own.
Note: I'm using Hadoop Chd4.2.1 and the mahout example for Syntheticcontrol data works fine so its just i'm looking for a custom code for my own data to cluster, so that i can run that in mahout.
Please help we are in a deadline to showcase some capability of Hadoop in Machine Learning.
Your classpath is incomplete.
You are missing the mahout-math jar.
Probably here:
/usr/lib/mahout/mahout-math-0.7.jar
That is the path on my system.
It does exist. This suggets you haven't set up your project correctly in Eclipse. It's hard to say how without any detail. But this is nothing to do with CDH or Mahout.

Hadoop 2.0. Does it support new MapReduce 2 only or both classic and new MapReduce?

Does Hadoop 2.0 support new MapReduce 2 (YARN) only or both classic and new MapReduce?
It supports both. The difference is that what you call classic mapreduce will now run as an application instead of integrated as part of the framework.
YARN allows the community to build other mapreduce-like applications to run on the Hadoop platform.
An example of another MR2 application can be found in the first link below called DIstributedShell
Brief Explanation MR2 and YARN
The very bottom of this article states it explicitly. All anyone changing to MRv2 would have to do is recompile their source code and it should run.
More Detailed Explanation

Resources