Hadoop on Mesos uses only one node? - hadoop

I have successfully set up Mesos 0.22.1 cluster on 5 nodes. I can run Marathon and Chronos tasks on all slave nodes. Now I’m trying to run Hadoop jobs using Mesos Scheduler. I have followed very good tutorial and I could run wordcount test job. But when I try to run some larger job (loading data from Kafka to HDFS using Camus) job is running without the errors, but uses only one node with one task tracker, though it has in total 30 map jobs, and my nodes configured to run 2 map jobs in parallel.
What am I missing? Shouldn’t Jobtracker split task to run in parallel on all available nodes using 2 Map slots on eash node?
And what is strange - on Jobtracker webpage cluster summary reports only 1 available node. Is it correct behavior?
Any ideas are greatly appreciated!

Related

Difference between local and yarn in hadoop

I have been trying to install Hadoop on a single node following the instructions written here. There are two sets of instructions, one for running a MapReduce job locally, and another for YARN.
What is difference between running a MapReduce job locally and running on YARN?
If you use local the map and reduce tasks are run in the same jvm. Usually this mode is used when we want to debug the code. Whereas if we use yarn resource manager which is in MRV2 comes into play and mappers and reducers will run in different nodes and different jvms with in the same node(if it is pseudo distributed mode).

Submitting parallel jobs from the same client to Hadoop

I have a three node Hadoop 2.6 cluster on which I tried to run multiple instances of TestDFSIO in parallel by using "&" at the end of each command. But it turns out that only one of those jobs gets submitted and processed by the cluster and the rest are not even submitted (somehow thrown away). So, was wondering if this has anything with Hadoop's Yarn or MapReduce options or anything else.

hadoop yarn resource management

I have a Hadoop cluster with 10 nodes. Out of the 10 nodes, on 3 of them, HBase is deployed. There are two applications sharing the cluster.
Application 1 writes and reads data from hadoop HDFs. Application 2 stores data into HBase. Is there a way in yarn to ensure that hadoop M/R jobs launched
by application 1 do not use the slots on Hbase nodes? I want only the Hbase M/R jobs launched by application 2 to use the HBase nodes.
This is needed to ensure enough resources are available for application 2 so that the HBase scans are very fast.
Any suggestions on how to achieve this?
if you run HBase and your applications on Yarn, the application masters (of HBase itself and the MR Jobs) can request the maximum of available resources on the data nodes.
Are you aware of the hortonworks project Hoya = HBase on Yarn ?
Especially one of the features is:
Run MR jobs while maintaining HBase’s low latency SLAs

Task tracker running but no job is submitted to it

I am running a 4 core node EMR cluster. I noticed on the aws console that the number of task trackers have reduced from 4 to 3. I checked on all the individual nodes of the cluster and task trackers are running on all of them. It seems one of the tasktracker is not visible to the jobtracker. What could be causing this? Is there a way to rectify this?
Thanks,

Can an oozie instance run jobs on multiple hadoop clusters at the same time?

I have an available developer Hadoop cluster to run test jobs as well as an available production cluster. My question is, can I utilize oozie to kick off workflow jobs to multiple clusters on a single oozie instance?
What are the gotchas? I'm assuming I can just reconfigure the job tracker, namenode, and fs location properties for my workflow depending on which cluster I want the job to run on.
Assuming the clusters are all running the same distribution and version of hadoop, you should be able to.
As your note, you'll need to adjust the jobtracker and namenode values in your oozie actions

Resources