Why is DCOS required if we can deploy Mesos Cluster directly - mesos

I have an query, If we can use Mesos Cluster by directly installing master and slave nodes. Then why do we need DCOS , is it that DCOS provides additional support along with mesos cluster. Please elaborate on this part.

Depends on your needs :-):
Here what in my opinion the Community Edition (the Enterprise Edition includes more proprietary features such as security) of DCOS adds to self setup of Mesos:
Easy setup, including Marathon and MesosDNS.
Command Line Interface with one Click install from the Universe. I personally especially like the simple installs of these services as it is really simple to install for example HDFS or cassandra in your cluster. Note: As with the above you can probably with some effort configure such setup yourself as both projects are on Github.
Very nice UI
So overall I would summarize DCOS provides a very easy and tested best-practice setup of Mesos and its ecosystem.
Hope this helps!

Related

How do I install components such as Apache Drill and Apache Hue in IBM Bluemix BigInsights Apache Hadoop

I am new to IBM Bluemix platform and exploring its BigInsights service. I can see pre configured components such as Pig Hive Hbase and others. But I want to know How can I install services like Drill or say Hue which is not configured by default. Also ssh to cluster nodes allows restricted access with no sudo rights in case one need to run yum commands.Does bluemix allows root access as I cannot see one. Thanks In advance.
As far as I know, it is not possible.
But you can use http://www.softlayer.com/ to build your own IOP (IBM Open Platform) Cluster in the cloud.
If you are interested in IBM's value-adds and you just want to try out:
https://www.youtube.com/watch?v=4p7LDeu_qQQ it is a nice tutorial to set up your own cluster via Docker.
This tutorial should be still valid for Hue:
https://developer.ibm.com/hadoop/2015/06/02/deploying-hue-on-ibm-biginsights/
Installing Drill doesn't look complicated:
https://drill.apache.org/docs/installing-drill-in-distributed-mode/
In conclusion: You need to move away from Bluemix, if you want to have a more customised BigInsights. But there are options: Softlayer, AWS, .. or just on your local computer (if you got sufficient resources, since some components like Hbase need a minimum amount of nodes)

Does the DCOS installation process work the same with an existing Mesos installation or do we need to start from scratch?

We have an existing Apache Mesos cluster and want to try DCOS in its shiny new Open Source form. However, it would be painful to do a destructive re-install of DCOS. So is it possible to just 'overlay' DCOS on an existing Mesos installation? Would any of the steps change in the DCOS installation guide or could the installer detect the existing Mesos and install DCOS components over it?
I don't think you can simply overlay DC/OS on top of your Mesos cluster. There are multiple reasons for that; one of those is that configuration is for Mesos and marathon is done differently in DC/OS as it is done for Mesos clusters.

Multi-node Hadoop cluster with Docker

I am in planning phase of a multi-node Hadoop cluster in a Docker based environment. So it should be based on a lightweight easy to use virtualized system.
Current architecture (regarding to documentation) contains 1 master and 3 slave nodes. This host machine uses HDFS filesystem and KVM for virtualization.
The whole cloud is managed by Cloudera Manager. There are several Hadoop modules installed on this cluster. There is also a NodeJS data upload service.
This time I should make architecture Docker based.
I have read several tutorials and have some opinions, but also open questions.
A. What do you think, is https://github.com/Lewuathe/docker-hadoop-cluster a good base for my project? I have found also an official image, but it is single-node.
B. How will system requirements change if I would like to make this in a single container? It would be great, because this architecture should work in different locations, so changes can be easily transferred between these locations. Synchronization between these so called clones would be important.
C. Do you have some other ideas, maybe best practices?
As of September 2016 there is no quick answer.
https://github.com/Lewuathe/docker-hadoop-cluster does not seem like a good start, as it should be universal for your B. option
Keep an eye on https://github.com/sequenceiq/hadoop-docker and https://github.com/kiwenlau/hadoop-cluster-docker
To address your question C., you may want to check out BlueData's software platform: http://www.bluedata.com/blog/2015/06/docker-containers-big-data-clusters
It's designed to run multi-node Hadoop clusters in a Docker-based environment and there is a free version available for download (you can also run it in an AWS EC2 instance).
This work has already been done for you, actually:
https://hub.docker.com/r/cloudera/clusterdock/
It includes a pre-packaged multi-node CDH cluster, with Cloudera Manager as an optional component for cluster management et al.

does Mesos provide its service like cluster management UI as OSS project?

I loved DCOS demos on Azure. No I wonder - having a private OpenStack based clud how to install Mesos with that UI manually? Is it possible or it is a part of DCOS they do not provide as OpenSource product?
The DCOS Dashboard is pretty cool :-). Currently it is just available via the DCOS beta on AWS and Azure. There will be on prem packages later on as well, potentially even a community edition. Feel free to contact/follow Mesosphere for updates.
Until then you can use the standard Mesos, Marathon, and Chronos UIs as Alex pointed out.
You can use Mesos and Marathon WebUI, by default they are available on ports 5050 and 8080 respectively.

Recommendations for Hadoop on EC2?

When running Hadoop in EC2, I seem to have two options:
A: Manage the cluster myself, using the EC2-specific shell scripts that come with Hadoop.
B: Use Elastic MapReduce, and pay a little extra for the convenience.
I'm leaning towards B, but I'd appreciate some advice from people with more experience. Here are my questions:
Are there any tasks that can be done with one of these methods but not the other?
Are there other options besides these two that I'm overlooking?
If I choose B, how easy would it be to go back to A? That is, what's the danger of vendor lock-in?
Third option:
You can use apache whirr to set up an hadoop cluster on ec2 (rackspace is also supported)
I have been told by people close to the Amazon Elastic MapReduce (EMR) development team that there are at least two other advantages to using EMR: a) Amazon is actively applying bug fixes and performance enhancements to the Hadoop code base used on EMR, and b) Amazon employs a high performance network between EMR servers and S3 servers that may not be available between EC2 servers and S3 servers.
UPDATE: See #mat's comments that refute the rumored advantages of using EMR.
Disclaimer: I'm the founder of Axemblr.com
There are also commercial alternatives you can use. Axemblr Tool for Cloudera CDH3 is a tool we are building that can deploy a cluster in just a few minutes with all you need (including Cloudera Hue, Mahout & Pig).
We are also building an alternative to EMR that's fully compatible from an API perspective, targeted at private clouds.
If you are wondering why it makes sense to run CDH on EC2 rather than EMR see:
http://www.quora.com/What-are-the-advantages-disadvantages-running-Clouderas-distribution-for-Hadoop-on-EC2-instances-rather-than-using-Amazons-Elastic-Map-Reduce-Service

Resources