Hadoop 2.7.1 Eclipse plugin creation - hadoop

After reading almost all the previous posts and web links, I feel to write this post as a need. I am unable to find any directory named ${YOUR_HADOOP_HOME}/src/contrib/eclipse-plugin in my windows based build of Hadoop 2.7.1.
I have downloaded already compiled build from a source but as a matter of learning i want to build it myself. Is there any other way to have source files for creating Hadoop 2.7.1 eclipse plugin? or did i miss something at the time of building my own windows based hadoop? Please explain and if possible provide source for windows 7 build environment.
Thanks

Related

Building Amabari HDP stacks from sources

I am trying to setup Ambari + HDP from sources (since Cloudera closed off Hortonworks package repos). Can anyone share experience / howto on this? Documentation is very scarce in this regard.
#alfheim the documentation is here:
https://cwiki.apache.org/confluence/display/AMBARI/Installation+Guide+for+Ambari+2.7.5
And a post with all the details:
Ambari 2.7.5 installation failure on CentOS 7
Be sure to get the correct versions of npm, maven, node, etc. There are some manual changes you may need to make inside of the source files. You can find quite a few posts solving those issues here on the ambari tag. Go back to pages 2 or 3 to find most recent posts for Building Ambari from Source or just search any errors you may have during build.

Can I use Spark without Hadoop for development environment?

I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo.
I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions.
What do I need:
Run all features from Spark without problems, but in a single computer (my home computer).
Everything that I made in my computer with Spark should run in a future cluster without problems.
There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes?
Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.
Yes you can install Spark without Hadoop.
Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html
Rough steps :
Download precomplied spark or download spark source and build locally
extract TAR
Set required environment variable
Run start script .
Spark(without Hadoop) - Available on Spark Download page
URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
If this url do not work then try to get it from Spark download page
This is not a proper answer to original question.
Sorry, It is my fault.
If someone want to run spark without hadoop distribution tar.gz.
there should be environment variable to set. this spark-env.sh worked for me.
#!/bin/sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration

i need to develop a project on "Design and Evaluation of Network-Levitated
Merge for HADOOP Acceleration" but i am HADOOP fresher i don't have any idea about HADOOP projects or how to combine the HADOOP functionality with GUI..
please guide me regarding this scenario. .
it would be convenient for me if i get an idea of HADOOP project. .
any simple upload and download project with HADOOP and GUI functionality will do my task. ..
Actually is my M tech project. For this project you need to make changes in HADOOP-0.20.2 and build it again and check the performance of hadoop using terasort. For this Author uses Infiband Network. Author had provided me the code for this project. But it was compiled on older version of linux. which has no update. So i was not able to compile that code. If you have infiband network install OFED in to your system. there is the patch and manual available at mallanox.com. download that manual and follow it.
it it works let me know. otherwise let it go.

Vector.class doesn't exist in Mahout-core 0.7 -cdh4.2.1 jar, not able to run SimpleKMeansClustering Example

Hi i'm new to Mahout so was to trying to run the SimpleKMeansClustering from Github.
I dowloaded the Mahout-core jar from Cloudera Repository.
Now when i'm trying to compile my code in eclipse, i cannot find the Vector.Class in org.apche.mahout.math
Please can you let me know where i can find the Vector.class or how to run a clustering example on my own.
Note: I'm using Hadoop Chd4.2.1 and the mahout example for Syntheticcontrol data works fine so its just i'm looking for a custom code for my own data to cluster, so that i can run that in mahout.
Please help we are in a deadline to showcase some capability of Hadoop in Machine Learning.
Your classpath is incomplete.
You are missing the mahout-math jar.
Probably here:
/usr/lib/mahout/mahout-math-0.7.jar
That is the path on my system.
It does exist. This suggets you haven't set up your project correctly in Eclipse. It's hard to say how without any detail. But this is nothing to do with CDH or Mahout.

Getting started with Hadoop and Eclipse

I'm following a couple of tutorials for setting up Hadoop with Eclipse.
This one is from Cloudera : http://v-lad.org/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
But this seems to focus on checking out the latest code from Hadoop and tweaking it.
This is rare although, usually the latest release of Hadoop will suffice most users needs?
Whereas this tutorial seems to focus on setting up and running hadoop :
http://v-lad.org/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
I just want to run some basic map reduce jobs to get started. I don't think I should be using the latest code from Hadoop as cloudera specifies in above first link to get started ?
Here is a blog entry and screencast on developing/debugging applications in Eclipse. The procedure works with most versions of Hadoop.
You may try this tutorial on installing Hadoop plugin for eclipse: http://bigsonata.com/?p=168

Resources