Hadoop version layout error - hadoop

I am getting this error in one or two nodes in a Hadoop cluster, where rest of the nodes are running fine:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
*********** Upgrade is not supported from this older version of storage to
the current version. Please upgrade to Hadoop-0.14 or a later version and
then upgrade to current version. Old layout version is 'too old' and latest
layout version this software version can upgrade from is -7.
Any idea how to fix this problem, without losing the data?

Related

Upgrade Elasticsearch to the latest version

I want to upgrade my ES cluster(current version: 7.6.2) to the latest version (7.15.2 until now)
Is it ok to upgrade directly to the latest version or should I upgrade with 2 or 3 steps on different version?
For instance, for upgrading mongo db, it should be upgraded step by step to the next version...
I just want to know what is the policy or even best practice for upgrading ES ?!?
Thanks
It's totally fine to update between minor versions (7.x -> 7.y) as well as one major version up from latest minor version (6.8 -> 7.y), see upgrading document for the details. Best of all, it can be a rolling upgrade so you can upgrade nodes in the cluster one by one without cluster downtime or data loss. Just make sure cluster health is green before moving on to the next node.

How are cdh package defined?

I have questions concerning cdh and how it is maintained:
when I go to the packaging info related to a specific cdh version, I can check the package version of each component (for instance for cdh 5.5.5 : https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_55.html#cdh_555 ). However I don't understand what does the "package version" refers to exactly. For instance, for the component Apache Parquet, the "package version" is parquet-1.5.0+cdh5.5.5+181 . How can I find out exactly what source code is packaged ? Does this correspond to a label on a specific repo? If I go to the "official" apache parquet repo, there is no "cdh5.5.5" branch, the closest thing I have is a tag called "1.5.0" ( https://github.com/apache/parquet-mr/tree/parquet-1.5.0 ) . How do the people from cdh know what parquet-1.5.0+cdh5.5.5+181 exactly refers to ?
Still concerning Apache Parquet, how come even the most recent cdh versions are still using the Apache Parquet on tag is 22 May 2014, ie more than 3 years ago. Why don't they upgrade to a newer version, like 1.6.0 ? The reason I'm asking is that there is a bug in 1.5.0 that was fixed more than 3 years ago in parquet 1.6.0, yet the latest cdh version is still using the 1.5.0 version. Is there a reason why they keep using a really old, bugged, version?
thanks !
You are correct in assuming parquet-1.5.0+cdh5.5.5+181 is closest to parquet 1.5.0. However the code will not be identical to parquet 1.5.0
upstream because:
CDH enforces cross component compatibility. Code and applications using parquet-1.5.0 must also work with all the other Hadoop services (HDFS, Hive, Oozie, YARN, Spark, Solr, HBase). Incompatibilities would have to be fixed so parquet's code would include those bug fixes.
CDH enforces major version compatibility. This means an application written on CDH5.1 should still work on CDH5.5 and CDH5.7, all CDH5.x versions. This also would alter the codebase.
The best way to interpret this is to say that parquet-1.5.0+cdh5.5.5+181 will support all features provided in parquet 1.5.0 and will also work with the corresponding Hadoop services packaged with CDH5.5.
Version compatibility is also the reason why CDH Hadoop service versions run older versions of the related upstream projects. It's much harder to maintain backwards compatibility especially if APIs change between versions.

Upgrade version of elasticsearch from 0.90.7 to 1.6.1

I am new in elasticsearch and I have by mistake installed 0.90.7 version in my server. Now I want to install shield for this I have to upgrade my elasticsearch version to 1.5 or above. So I need to Upgrade version without loosing existing data. Can anybody help me how to upgrade version to 1.6.1?
You can upgrade your Elasticsearch version from 0.90.x to 1.x in the following way
Cluster restart upgrade process
Before upgrading Elasticsearch, it is a good idea to consult the breaking changes docs.
As per Elasticsearch Documentation before performing an upgrade, it’s a good idea to back up the data on your system.

Which version of Sqoop work with Hadoop 0.20.2?

Does Sqoop 2 work with Hadoop 0.20.2?
What version of sqoop is best to download?
1.4.2 or 1.99.1 ?
Thanks!)
Sqoop have currently two main branches. Sqoop 1 is older fully functional and mature project supporting Hadoop 0.20, 1.x, 0.23 and 2.0.x You can download the bits from here. Please make sure that you download file ending with "_hadoop-0.20", otherwise you will be getting weird exceptions.
Second branch is Sqoop2 which is redesign of the project. There is available first cut with version 1.99.3. This branch is supporting only Hadoop 1.x and 2.x and can be downloaded from here. Again you need to make sure to download version that matches your hadoop distribution. There is a probability that the build for Hadoop 1.x will be working on 0.20.2 as well as those versions are not that different, however nobody has verified that.

Which hadoop version to use?

Both hadoop in action & the definitive guide, both have built their foundation from the mapred classes. And most of those classes have been deprecated in 0.20.2. The signatures of the new classes are different. Can anyone tell me about the various changes done. E.g. the partitioner class has been deprecated. How is the new reducer going to provide its feature. Concept changes that happened in 0.20.2
What should i use? On the hadoop wiki, i see
Download
1.0.X - current stable version, 1.0 release
1.1.X - current beta version, 1.1 release
2.X.X - current alpha version
0.23.X - simmilar to 2.X.X but missing NN HA.
0.22.X - does not include security
0.20.203.X - legacy stable version
0.20.X - legacy version
Does that means the mapred classes were deprecated & have been reintroduced. Which hadoop version should i use? 0.20.2 or 1.0.x ?
Please check this out, it explains the version control of Hadoop development: http://www.cloudera.com/blog/2012/04/apache-hadoop-versions-looking-ahead-3/
So you can get idea of why it has quite complex versions.
p/s: I'm using v1.0.3 for my system :)
That is an April Fools Day post. :)
But anyone can agree the versions are misleading at best.

Resources