Interface InputFormat vs Class InputFormat - hadoop

I am a newbie to Hadoop, I am trying to learn it and came across 2 versions of InputFormat.
org/apache/hadoop/mapred/InputFormat
org/apache/hadoop/mapreduce/InputFormat
The explanation of both apis seems to be same but one is interface and other is a class. Can someone please help me why there are 2 APIs with same explanation in Hadoop?

MapReduce has undergone a complete overhaul in hadoop-0.23 and it is called as MapReduce 2.0 (MRv2) or YARN.
org/apache/hadoop/mapred/InputFormat refers to hadoop MRV1
org/apache/hadoop/mapreduce/InputFormat refers to hadoop MRV2
Both these libraries refer to same functionality.
MRV2 is a rewrite of MRV1 for the compatibility of YARN architecture.

Related

Apache Airflow/Azkaban workflow Schedulers compatibility with Hadoop MRv1

I'm working on a project that relies on Hadoop but MRv1 architecture (Hadoop-1.1.2). I tried oozie scheduler for creating workflows(mapred) but gave up eventually, cause it is a nightmare to configure and I couldn't get it to work. I was wondering if I should try these other workflow Schedulers such as Azkaban or Apache Airflow. Would they be compatible with my requirements ?

Memory requirements for hadoop cluster

I am a starter in hadoop. How do I know how much RAM does my hadoop cluster have and how much RAM is my application using? I am using CDH 5.3 version but I would prefer knowing it in general for hadoop clusters.

Cassandra and Hadoop

I am new to Cassandra and Hadoop. I am trying to read cassandra data on hourly basis and dump into HDFS. Cassandra and Hadoop are on different clusters. Any pointers on Clients/API I could use to do this is much appreciated.
I recommend Java because Hadoop and Cassandra are both Java based. Astyanax is a good Java Cassandra API.
I've used org.apache.hadoop to write to HDFS using Java but there might be something better out there.

hadoop 2.0 shuffle benchmark

I have found some page saying that hadoop 2.0 has a built-in benchmark testtool for shuffle.
But I'm unable to find it!
Could somebody guide me where to look for the same? I know in hadoop 0.20.* there is a test jar. But I can't find it in hadoop 2.0.

Is it possible to add a "Combine" step to the Amazon Elastic MapReduce workflow?

I am referring to the Combine step mentioned on the Hadoop wiki. I have been unable to find a reference to it in the AWS documentation, and I'd like to utilize this step.
The documentation for Combiner will be in the Apache documentation and not in the AWS documentation. Amazon Elastic MapReduce supports 0.18.3 and 0.20.2 versions of Hadoop with custom patches. Apache MR Tutorial has reference to how the combiner function should be used. Call the Job.setCombinerClass() to set the combiner class.

Resources