When I get on the dcos github, I am bombarded with directories and files, where exactly can I find the source code of dcos??? I am assuming there should be a directory named src but I could not find it... Thanks for your time
You can find it here https://github.com/dcos
DCOS is just a set of software that is configured to work together to deliver complete Platform as a Service (PaaS). Main parts of DCOS are:
Mesos - hearth of whole DCOS
Marathon - scheduler that runs services on Mesos
Metronome - scheduler for Jobs (could be replaced with Chronos)
Minuteman - A Distributed load balancer (L3)
Marathon-LB - A service discovery & load balancing tool
Related
I'm new to kubernetes and trying to understand what resources resides on which node.
Can anyone please confirm what kind of kubernetes resources other than pods are usually worker node hosts ? On which nodes do other resources like ingress, persistence-volume-claim, service, config map etc resides ?
Hi Pramode Kumar Pandit,
On Workers you will run your workload, such as deployed microservices, the ingress run on your master node.
the Persistence volumes will depend how you configure your ICP to determine where it will run.
If you look at this post you will have an idea about the architecture using ICP:
https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/getting_started/architecture.html
I made a spark application that analyze file data. Since input file data size could be big, It's not enough to run my application as standalone. With one more physical machine, how should I make architecture for it?
I'm considering using mesos for cluster manager but pretty noobie at hdfs. Is there any way to make it without hdfs (for sharing file data)?
Spark maintain couple cluster modes. Yarn, Mesos and Standalone. You may start with the Standalone mode which means you work on your cluster file-system.
If you are running on Amazon EC2, you may refer to the following article in order to use Spark built-in scripts that loads Spark cluster automatically.
If you are running on an on-prem environment, the way to run in Standalone mode is as follows:
-Start a standalone master
./sbin/start-master.sh
-The master will print out a spark://HOST:PORT URL for itself. For each worker (machine) on your cluster use the URL in the following command:
./sbin/start-slave.sh <master-spark-URL>
-In order to validate that the worker was added to the cluster, you may refer to the following URL: http://localhost:8080 on your master machine and get Spark UI that shows more info about the cluster and its workers.
There are many more parameters to play with. For more info, please refer to this documentation
Hope I have managed to help! :)
I have cluster of 3 Mesos slaves, where I have two applications: “redis” and “memcached”. Where redis depends on memcached and the requirement is both of the applications/services should start on same node instead of different slave nodes.
So I have created the application group and added the dependency properly in the JSON file. After launching the JSON file via “v2/groups” REST API, I observe that sometime both application group will start on same node but sometimes it will start on different slaves which breaks our requirement.
So intent/requirement is; if any application fails to start on a slave both the application should failover to other slave node. Also can I configure the JSON file to tell Marathon to start the application group on slave-1 (specific slave first) if it is available else start it on other slave in a cluster. Due to some reason if this application group will start on other slave can Marathon relaunch the application group to slave-1 if it is available to serve the request.
Thanks in advance for help.
Edit/Update (2):
Mesos, Marathon, and DC/OS support for PODs is available now:
DC/OS: https://dcos.io/docs/1.9/usage/pods/using-pods/
Mesos: https://github.com/apache/mesos/blob/master/docs/nested-container-and-task-group.md
Marathon: https://github.com/mesosphere/marathon/blob/master/docs/docs/pods.md
I assume you are talking about marathon apps.
Marathon application groups don't have any semantics concerning co-location on the same node and the same is the case for dependencies.
You seem to be looking for a Kubernetes like Pod abstraction in marathon, which is on the roadmap but not yet available (see update above :-)).
Hope this helps!
I think this should be possible (as a workaround) if you specify the correct app contraints within the group's JSON.
Have a look at the example request at
https://mesosphere.github.io/marathon/docs/generated/api.html#v2_groups_post
and the constraints syntax at
https://mesosphere.github.io/marathon/docs/constraints.html
e.g.
"constraints": [["hostname", "CLUSTER", "slave-1"]]
should do. Downside is that there will be no automatic failover to another slave that way. Still, I'd be curious why both apps need to specifically run on the same slave node...
I recently came across Apache Mesos and successfully deployed my Storm topology over Mesos.
I want to try running Storm topology/Hadoop jobs over Apache Marathon (had issues running Storm directly on Apache Mesos using mesos-storm framework).
I couldn't find any tutorial/article that could list steps how to launch a Hadoop/Spark tasks from Apache Marathon.
It would be great if anyone could provide any help or information on this topic (possibly a Json job definition for Marathon for launching storm/hadoop job).
Thanks a lot
Thanks for your reply, I went ahead and deployed a Storm-Docker cluster on Apache Mesos with Marathon. For service discovery I used HAProxy. This setup allows services (nimbus or zookeeper etc) to talk to each other with the help of ports, so for example adding multiple instances for a service is not a problem since the cluster will find them using the ports and loadbalance the requests between all the instances of a service. Following is the GitHub project which has the Marathon recipes and Docker images: https://github.com/obaidsalikeen/storm-marathon
Marathon is intended for long-running services, so you could use it to start your JobTracker or Spark scheduler, but you're better off launching the actual batch jobs like Hadoop/Spark tasks on a batch framework like Chronos (https://github.com/airbnb/chronos). Marathon will restart tasks when the complete/fail, whereas Chronos (a distributed cron with dependencies) lets you set up scheduled jobs and complex workflows.
While a little outdated, the following tutorial gives a good example.
http://mesosphere.com/docs/tutorials/etl-pipelines-with-chronos-and-hadoop/
Can a Hadoop Yarn instance manage nodes from different places on Earth, networks? Can it manage nodes that use different platforms?
Every note about Yarn I found tells that Yarn manages clusters, but if the app I deploy is written in Java then it should probably work on the nodes regardless of the nodes' hardware.
Similarly, Yarn seems general enough to support more than just a LAN.
YARN is not platform aware. It is also not aware about how application processes on different hosts communicate with each other to perform the work.
In the same time for YARN application master should be run as a command line - and thereof any node on the cluster with enough resources should be able to run it.
If not every platform is capable to run specific app master- then YARN should be aware on it. Today it can not, but I can imegine platform to be special kind of resource - and then YARN will select appropriate node
Regarding LAN if you have application master which knows how to manage job over several LAN - it is should be fine with YARN.