I'm using CDH 4.7.0 and will be installing Flume to feed HDFS data. I also downloaded Flume v1.4.0 from Apache (the same version that CDH comes with. There seem to be 2 flume-ng-core files between the one that comes with CDH and the one from Apache. There versions are 1.4.0 and 1.4.0-cdh4.7.0. Should I be using 1.4.0-cdh4.7.0 or can I safely use 1.4.0?
Flume 1.4.0 and Flume version 1.4.0-cdh4.7.0 are same but 1.4.0-cdh4.7.0 is compiled and tested with CDH4.7.0 therefore 1.4.0-cdh4.7.0 is risk free to use with CDH4.7.0.
Hence I recommend to use the cdh4.7.0 version of flume along with your CDH4.7.0 version.
Related
I'm planning to build a web crawler using nutch and solr. I want to know which version of hadoop should I install to work with nutch 1.15.
Nutch 1.15 is built with Hadoop 2.2.0 but it runs also on Hadoop installations using higher versions of Hadoop 2.x and 3.x.
I am new to HDP installation using Ambari. I want to install Hadoop 2.9.0 using Ambari web installation. My Ambari version is 2.7.0.0 and I am using HDP 3.0 which has Hadoop 3.1.0. But I need to install Hadoop 2.9.0. Can someone please let me know if this can be done? And how can this be achieved?
I have not started the cluster installation yet and I'm done with Ambari installation.
Ambari uses pre-defined software stacks.
HDP does not offer any stack with Hadoop 2.9.0
You would therefore need to manually install that version of Hadoop yourself, although you can still manage the servers (but not the Hadoop configuration) using Ambari
In any case, there's little benefit to installing a lower version of the software, plus you won't get Hortonworks support if you do that
I have an installed CDH cluster and used hadoop version, but it returns only with Hadoop version. Is there any way to get maybe all installed components version number on a graphical interface? Which command can get for example Spark version number?
Open CM (hostname:portnumber) -> Hosts tab -> Host Inspector to find what version of
CM and CDH is installed across all hosts in the cluster, as well as installed cdh components list with version details
Spark version can checked in using
spark-submit --version
Spark was developed separately from Hadoop-hdfs and Hadoop-mapreduce as a standalone tool which can be be used along with Hadoop, as such most of its interfaces are are different from hadoop.
I am trying to upgrade my Hadoop infrastructure installed on Ubuntu 14.04 from hadoop-2.2.0-stable to hadoop-2.6.0-stable. Am I supposed to remove my previous version of hadoop and install a fresh copy or is it possible to move to a newer version without loosing any data? Any help is appreciated
I am using Cloudera Manager with CDH4.2.2 for my 3+1 cluster. On starting the installation with cloudera manager, it automatically downloads and installs JDK1.6. I want to use JDK1.7 with CDH for my convinience. Is it possible or is there any version of CDH which while installating Hadoop in the cluster automatically downloads and installs and successfully runs Hadoop with JDK1.7?
If yes, may I know which version of CDH is it and where do i get to download it from?
I want to work with JDK1.7 instead of 1.6 because i want to install Apache Giraph on CDH but it seems Giraph does not fit fine with JDK1.6 and needs the JDK1.7.
With Regards,
JDK 1.7 is supported for all CDH applications as of CDH 4.4 and Cloudera Manager 4.7.
That being said, no version of Cloudera Manager 4.x installs JDK 1.7 during the installation (latest version is 4.8.2). The only version of Cloudera Manager that installs JDK 1.7 automatically is 5.0.0.
To summarize: If you want an automated installation of JDK 1.7 via Cloudera Manager, you need to upgrade to CDH 5, and CM 5.0.0. Alternatively, you could upgrade to CDH4.4, and then perform a manual installation of JDK 1.7.