Hadoop CouchDB Elastic Search - hadoop

I have already installed CouchDB (ver 1.1.0), Elastic Search (0.17.6) on my Fedora. I want now to install Hadoop Map/reduce (http://hadoop.apache.org/mapreduce/) and Hadoop DFS (http://hadoop.apache.org/hdfs/) on this machine but I wonder whether there is a conflict and problem between them? Can Elastic Search and CouchDB function properly?
Thanks for your answers

I see no reason for conflict. I wouldn't put all of these on one production machine, because of the performance issues, but if it's your development box, then go ahead.
CouchDB is project written in Erlang that uses Mozilla's SpiderMonkey for executing Javastcipt queries
Hadoop is pure Java and will not conflict with above in any way.
Elasticsearch and Lucene are also Java, and it wont conflict with Hadoop because theirs startup scripts will define specific classpaths, so multiple installed versions of the same libraries shouldn't create an issue.

Related

Building Amabari HDP stacks from sources

I am trying to setup Ambari + HDP from sources (since Cloudera closed off Hortonworks package repos). Can anyone share experience / howto on this? Documentation is very scarce in this regard.
#alfheim the documentation is here:
https://cwiki.apache.org/confluence/display/AMBARI/Installation+Guide+for+Ambari+2.7.5
And a post with all the details:
Ambari 2.7.5 installation failure on CentOS 7
Be sure to get the correct versions of npm, maven, node, etc. There are some manual changes you may need to make inside of the source files. You can find quite a few posts solving those issues here on the ambari tag. Go back to pages 2 or 3 to find most recent posts for Building Ambari from Source or just search any errors you may have during build.

Installing Spark through Ambari

I've configured cluster of VMs via Ambari.
Now trying to install Spark.
In all tutorials (i.e. here) it's pretty simple; Spark installation is similar to other services:
But it appears that in my Ambari instance there is simply no such entry.
How can I add Spark entry to Ambari services?
There should be a SPARK folder under the /var/lib/ambari-server/resources/stacks/HDP/2.2/services directory. Additionally there should be spark folders identified by their version number under /var/lib/ambari-server/resources/common-services/SPARK. Either someone modified your environment or it's a bad and or non-standard install of Ambari.
I would suggest re-installing as it is hard to say exactly what you need to add as its unclear what other things may be missing in the environment.

Can I use Spark without Hadoop for development environment?

I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo.
I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions.
What do I need:
Run all features from Spark without problems, but in a single computer (my home computer).
Everything that I made in my computer with Spark should run in a future cluster without problems.
There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes?
Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.
Yes you can install Spark without Hadoop.
Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html
Rough steps :
Download precomplied spark or download spark source and build locally
extract TAR
Set required environment variable
Run start script .
Spark(without Hadoop) - Available on Spark Download page
URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
If this url do not work then try to get it from Spark download page
This is not a proper answer to original question.
Sorry, It is my fault.
If someone want to run spark without hadoop distribution tar.gz.
there should be environment variable to set. this spark-env.sh worked for me.
#!/bin/sh
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

Hadoop 2.7.1 Eclipse plugin creation

After reading almost all the previous posts and web links, I feel to write this post as a need. I am unable to find any directory named ${YOUR_HADOOP_HOME}/src/contrib/eclipse-plugin in my windows based build of Hadoop 2.7.1.
I have downloaded already compiled build from a source but as a matter of learning i want to build it myself. Is there any other way to have source files for creating Hadoop 2.7.1 eclipse plugin? or did i miss something at the time of building my own windows based hadoop? Please explain and if possible provide source for windows 7 build environment.
Thanks

Getting started with Hadoop and Eclipse

I'm following a couple of tutorials for setting up Hadoop with Eclipse.
This one is from Cloudera : http://v-lad.org/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
But this seems to focus on checking out the latest code from Hadoop and tweaking it.
This is rare although, usually the latest release of Hadoop will suffice most users needs?
Whereas this tutorial seems to focus on setting up and running hadoop :
http://v-lad.org/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
I just want to run some basic map reduce jobs to get started. I don't think I should be using the latest code from Hadoop as cloudera specifies in above first link to get started ?
Here is a blog entry and screencast on developing/debugging applications in Eclipse. The procedure works with most versions of Hadoop.
You may try this tutorial on installing Hadoop plugin for eclipse: http://bigsonata.com/?p=168

Resources