hadoop 2.0 shuffle benchmark - hadoop

I have found some page saying that hadoop 2.0 has a built-in benchmark testtool for shuffle.
But I'm unable to find it!
Could somebody guide me where to look for the same? I know in hadoop 0.20.* there is a test jar. But I can't find it in hadoop 2.0.

Related

i was new to apache storm would like to know key difference between storm 1.1 and storm 2.0?

I was trying to find up any major difference between storm 1.1 and storm 2.0.
Is there any difference while setting up cluster for either of the versions?
(read on official website about new Java-based implementation but has anyone seen any difference between these two versions).
In addition to reading the changelog at https://www.apache.org/dist/storm/apache-storm-2.0.0/RELEASE_NOTES.html, you can look at https://issues.apache.org/jira/browse/STORM-2306?focusedCommentId=16291947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16291947 for some performance numbers. You can also run your own benchmarks of course.

Interface InputFormat vs Class InputFormat

I am a newbie to Hadoop, I am trying to learn it and came across 2 versions of InputFormat.
org/apache/hadoop/mapred/InputFormat
org/apache/hadoop/mapreduce/InputFormat
The explanation of both apis seems to be same but one is interface and other is a class. Can someone please help me why there are 2 APIs with same explanation in Hadoop?
MapReduce has undergone a complete overhaul in hadoop-0.23 and it is called as MapReduce 2.0 (MRv2) or YARN.
org/apache/hadoop/mapred/InputFormat refers to hadoop MRV1
org/apache/hadoop/mapreduce/InputFormat refers to hadoop MRV2
Both these libraries refer to same functionality.
MRV2 is a rewrite of MRV1 for the compatibility of YARN architecture.

How to integrate apache-nutch-1.9 and Hadoop 2.3.0-cdh5.1.0?

I'm very new to nutch and was trying to integrate nutch 1.9 with Hadoop 2.3.0-cdh5.1.0 and getting exceptions like below:
Injector: java.lang.UnsupportedOperationException: Not implemented by the DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2365)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.nutch.crawl.Injector.inject(Injector.java:297)
at org.apache.nutch.crawl.Injector.run(Injector.java:380)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.crawl.Injector.main(Injector.java:370)
I have few questions about this:
How do i solve this issue. Also I had question in mind if I can really integrate the version of hadoop which I am using to nutch 1.9?
Thanks,
Sandeep

Mapping between hadoop-1.2.1 and hadoop-2.2.0?

I am currently reading Hadoop in Action .The book is very good, however it uses hadoop 1.2.1 to explain and showcase all the examples. But, I am using hadoop 2.2.0.
Does anybody know where I can find a full documentation about hadoop api changes ? and a simple mapping between 1.2.1 and 2.2.0 ?
For examples
DataJoinMapperBase, DataJoinReducerBase, and TaggedMapOutput
Does not present in 2.2.0 and I am looking for there counterparts in 2.2.0 :)
Thanks
"Hadoop: The Definitive Guide, Third Edition" by Tom White (Buy Here)
supports hadoop v2.2.
The source code is give on github https://github.com/tomwhite/hadoop-book
as mentioned on github, the code of the book is tested with
This version of the code has been tested with:
* Hadoop 1.2.1/0.22.0/0.23.x/2.2.0
* Avro 1.5.4
* Pig 0.9.1
* Hive 0.8.0
* HBase 0.90.4/0.94.15
* ZooKeeper 3.4.2
* Sqoop 1.4.0-incubating
* MRUnit 0.8.0-incubating
Regarding your question
Hadoop 2.2 use mapreduce api v2 while Hadoop 1.x use old mapreduce api. Check this book, it clearly explain the mapreduce code difference between 1.x and 2.2.
hope it helps..!!!

Is it possible to add a "Combine" step to the Amazon Elastic MapReduce workflow?

I am referring to the Combine step mentioned on the Hadoop wiki. I have been unable to find a reference to it in the AWS documentation, and I'd like to utilize this step.
The documentation for Combiner will be in the Apache documentation and not in the AWS documentation. Amazon Elastic MapReduce supports 0.18.3 and 0.20.2 versions of Hadoop with custom patches. Apache MR Tutorial has reference to how the combiner function should be used. Call the Job.setCombinerClass() to set the combiner class.

Resources