we are Trying to figure out which Distribution of Linux be best suited for the Nutch-Hadoop Integration?.
we are planning to Use Clusters for Crawling large contents through Nutch.
Let me Know if You need more clarification on this question?.
Thanks you.
There is no much difference between any major Linux distribution in this case. But I'd recommend you one that has hadoop packages prepared. I'm using Cloudera's Hadoop distribution on debian and it works very well.
hadoop and hbase packages will be in the next Debian Stable version:
http://packages.debian.org/search?keywords=hadoop
Related
I recently installed hadoop v_2 with the YARN Configuration. I am planning to install Hadoop ecosystem stack such as Pig,Hive,Hbase,Oozie,Zookeeper etc. I would like to know if I should install the tools from the same link that I did for Hadoop 1.0 Configuration. If not, Could anyone please send me the link for the Hadoop 2 Configuration for these tools ?. I heard that Pig and Hive are more faster in Hadoop 2.0. Therefore would like to know if there are better versions.
Thanks,
Gautham
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-1/CDH4-Installation-Guide/cdh4ig_topic_16_2.html
this may be useful
also i think that the configuration isn't different from v 1
I want to use large scale machine learning algoritms and I want to use Mahout for this task, but it seems Mahout depends on Hadoop and Hadoop is distributed only as Lunix packages.
Is there any way to use Mahout\Hadoop on Windows?
Or maybe someone can suggest some alternatives?
There are multiple Hadoop vendors already. Hortonworks is one of them and released a version of their platform on Windows: HDP on Windows.
Mahout should be able to run on top of this!
Alternatively there is also Datameer, which you have to pay for (except you coming from academia) with their Smart Analytics feature!
I am new to HBASE and HADOOP and would require available compatible versions of hbase and hadoop to run my experiments.
The current stable version of at "http://apache.techartifact.com/mirror/hbase/" is hbase-0.94.1 . Can anybody kindly tell which version of hadoop should I use so that there is no
compatibility issue and no future data loss.
Please suggest from the hadoop and hbase releases that are currently available online.
below are the sites I am using for downloading these releases
http://apache.techartifact.com/mirror/hadoop/common/ (hadoop)
http://apache.techartifact.com/mirror/hbase/ (hbase)
If you want to be sure about the compatibility of the Hadoop and HBase distribution you are using, you might consider using the Apache Bigtop project or the Cloudera CDH package.
BigTop :
The primary goal of Bigtop is to build a community around the
packaging and interoperability testing of Hadoop-related projects.
This includes testing at various levels (packaging, platform, runtime,
upgrade, etc...) developed by a community with a focus on the system
as a whole, rather than individual projects.
Cloudera :
CDH consists of 100% open source Apache Hadoop plus nine other open
source projects from the Hadoop ecosystem. CDH is thoroughly tested
and certified to integrate with the widest range of operating systems
and hardware, databases and data warehouses, and business intelligence
and ETL systems.
Download both from stable folder. I do not know if some version is not compatible with other.
i use hadoop-0.20.203.0 with hbase-0.94.1 with out any problems.
We are planning to learn and use Hadoop for small prototyping. Not in production for now.
Which distribution of Hadoop is good to start with?
We have already installed Hadoop 0.20.2 and practicing. But Apache distribution is not compatible with Sqoop and some other things. To learn security we have to go for Hadoop 1.0.0 which is currently in beta version.
Could you, please, tell us which distribution and version of Hadoop is good to learn and prototyping.
Download the code from trunk (or 0.23 branch) and do a build, it should have all the features. Or else download the 0.23 release, it should have all the features till date except MRv1.
I'm interested in playing around with Mahout a bit (and by proxy Hadoop), I'm wondering if anyone has experience installing these projects locally. I know that Mahout is implemented in JAVA, but I'm not really sure about Hadoop. I read a bit about using them on amazon or rackspace, but I have my heart set on testing locally.
You do not need to download or install Hadoop to use Mahout locally, even its Hadoop-based bits. You do need to use Maven, which will manage downloading the dependencies. The Mahout command line, and examples involving mvn, should all just work.