upgrading Cloudera hadoop from cdh3u6 to cdh5 - hadoop

In my project, we are looking for upgrading Hadoop and hive from cdh3 to a higher version.
We planned to upgrade from cdh3 to chd4 first,but recent enhancements states that, cdh5 is more stable version.
So we have planned to upgrade to cdh5.
But I'm looking for a documentation where i can do this.

It looks like you have to go from 3 to 4 and then from 4 to 5. I don't know what components you use in 3, but be careful as there is a significant change to HBase between 4 and 5. Best of luck to you! Here is the link to the doc:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Managing-Clusters/cm5mc_upgrade_cdh3_to_4.html

Related

Hadoop Versions seem to fall under 0.x, 1.x, and 2.x, but when discussing YARN/MapReduce, every page Refers to Hadoop 1 and Hadoop 2.0

On Apache's distribution page, Hadoop seems to exist in 0.x, 1.x, and 2.x. However, when discussing MapReduce/Yarn, and deciding on a version of Hive and Hbase, there only seems to be discussion of Hadoop 1 and 2. Why is this? Is 0.x just a beta release?
The 1.X and 2.X versions derive from the 0.X line, which is still being continued (as far as I know). The version numbering is quite confusing. A helpful chart can be found at https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know . Even if it's quite outdated, you can see the relevant branches and what derives from what.
Also check Hadoop release version confusing for more explanation.

Which hadoop version should I choose among 1.x, 2.2 and 0.23

Hello I am new to Hadoop and pretty confused with the version names and which one should I use among 1.x ( great support and learning resources ), 2.2 or 0.23.
I have read that hadoop is moving to YARN completely from v0.23 ( link1 ). But at the same time its all over the web that hadoop v2.0 is moving to YARN ( link2 ) and I can see the YARN configuration files in Hadoop 2.2 itself.
But since 0.23 seems to be the latest version to me, Does 2.2 also
support YARN ? ( Refer link 1, it says hadoop will support YARN from
v0.23 )
And as a beginner which version should I go for 1.x or 2.x for
learning perspective of hadoop.
Are other technologies that works with hadoop like pig, hive etc.
available with the latest version of hadoop?
Thanks.
UPDATE
Thankyou all for replying.
I ended up using hadoop2.2 and since all famous tutorials and resources are outdated, though I found one good book to get started with v2.2.
"Hadoop: The Definitive Guide, Third Edition" by Tom White (Buy Here)
supports hadoop v2.2.
The source code is give on github https://github.com/tomwhite/hadoop-book
as mentioned on github, the code of the book is tested with
This version of the code has been tested with:
* Hadoop 1.2.1/0.22.0/0.23.x/2.2.0
* Avro 1.5.4
* Pig 0.9.1
* Hive 0.8.0
* HBase 0.90.4/0.94.15
* ZooKeeper 3.4.2
* Sqoop 1.4.0-incubating
* MRUnit 0.8.0-incubating
hope it helps..!!!
There are a few active release series. The 1.x release series is a continuation of the 0.20
release series. A few weeks after 0.23 released, the 0.20 branch formerly known as 0.20.205 was renumbered 1.0. There is next to no functional difference between 0.20.205 and 1.0. This is just a renumbering.
The 0.23 includes several major new features includes a new MapReduce runtime, called MapReduce 2, implemented on a new system called YARN (Yet Another Resource Negotiator), which is a general resource management system for running distributed applications. Similarly, 2.x release is a continuation of the 0.23 release series. So the 2.2 also support YARN.
According to Hadoop 2.2 release note
1.2.X - current stable version, 1.2 release
2.2.X - current stable 2.x version
0.23.X - similar to 2.X.X but missing NN HA.
I would suggest starting with Cloudera distribution since you just start learning. The CDH 4.5 includes the YARN feature you are looking for. You can also try HortonWorks distribution. The advantage of going with these vendors is that you do not need to worry about which version of components such as Hive, Pig to work with your Hadoop installation.
I recommended you to start with hadoop-2.2.0 which gives good knowledge. Industry prefers YARN itself and in production 2.x only exists

What are the differences between hadoop versions?

What is the difference between hadoop version 0.x, 1.x and 2.x Also can someone tell me how cdh 3 and 4 differ.
Cloudera provides an extensive list of new features and changes in each release:
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Release-Notes/CDH4-Release-Notes.html
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Release-Notes/cdh4rn_topic_2.html
You can also look at the list of individual changes from the Vanilla Apache Hadoop releases:
http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.7.0.CHANGES.txt

Which version of Sqoop work with Hadoop 0.20.2?

Does Sqoop 2 work with Hadoop 0.20.2?
What version of sqoop is best to download?
1.4.2 or 1.99.1 ?
Thanks!)
Sqoop have currently two main branches. Sqoop 1 is older fully functional and mature project supporting Hadoop 0.20, 1.x, 0.23 and 2.0.x You can download the bits from here. Please make sure that you download file ending with "_hadoop-0.20", otherwise you will be getting weird exceptions.
Second branch is Sqoop2 which is redesign of the project. There is available first cut with version 1.99.3. This branch is supporting only Hadoop 1.x and 2.x and can be downloaded from here. Again you need to make sure to download version that matches your hadoop distribution. There is a probability that the build for Hadoop 1.x will be working on 0.20.2 as well as those versions are not that different, however nobody has verified that.

apache hadoop versions 2.0 vs. 0.23

There are so many Hadoop versions and different distributions which make me confused. I have a few questions.
Apache Hadoop 1.x is from 0.20.205?
Apache Hadoop 2.0 is from 0.22 or 0.23?
According to this blogpost from Cloudera:
There is next to no functional difference between 0.20.205 and 1.0.
This is just a renumbering.
Hadoop's Yarn site states:
MapReduce has undergone a complete overhaul in hadoop-0.23 and we now
have, what we call, MapReduce 2.0 (MRv2) or YARN
It's also worth to have a look at this diagram too. It shows the tree of different Hadoop versions as well as the 3rd party distributions on top of them.
updated answer
http://elephantscale.com/hadoop2_handbook/Hadoop_Versions.html
(disclaimer : I am a co-author of this online book)
hadoop release 1.0.0 is avalable from 0.20.x
As a rule of thumb,remember
1.xx is = 0.20.0
2.xx is > 0.20.0
We can easily remember and choose the correct apache distribution for hadoop cluster setup.

Resources