I want to submit a couple of jobs to a cluster, but I want them to execute only if there are no other jobs on queue. How can I do this?
The cluster uses TORQUE+MAUI management system.
Maui may not be able to do this. I have looked over the documentation for Maui for a negative or positive affinity and hpenable but I did not see any references to it. I did see references in the main product Moab which is based off Maui. These may not be in Maui but I though I would mention them in case it works for you.
http://docs.adaptivecomputing.com/8-1-0/enterprise/help.htm#topics/moabWorkloadManager/topics/resourceAccess/managingreservations.html#affinity
So in the mainstream product you would do something like:
SRCFG[res1] CLASSLIST=prio+,~lowprio-
This tells the software to positively draw in prio class jobs. The ~ for lowpro tells the software to ignore this credential until all of the prio jobs are evaluated/started/ran. The "-" says to not run in this reservation unless no other resources are available. You then just add a task count to consume the entire cluster.
The other option is to submit all of those jobs with a hold and have a cron job do a qstat and check for other jbos before releasing the jobs with a releasehold .
Related
I have a cluster that does not allow direct ssh access, but does permit me to submit commands through a proprietary interface.
Is there any way to either manually launch OpenMPI jobs, or documentation on how to write a custom launcher?
I don't think you can do it without breaking some kind of agreement.
I assume you have some kind of a web-based interface that allows you to fill certain fields and maybe upload data. Or something similar. What this interface will probably do - is it's going to generate a request/file for a scheduler. Most likely, SGE or PBS. The direct access to the cluster is limited in order to
organize task priorities and order
prevent users from hogging the machines
make it easier to launch complicated tasks requiring complicated machine(s) configuration
So, you, effectively, want to go around a scheduler. I don't think you can or you should.
However, usually, the clusters have so-called, head nodes which would allow SSH access to them. These nodes would serve as a place to submit scheduler requests from them and, maybe, do small compilation/result processing (with very limited resources). Such configuration would eliminate the web interface but still leaves a very important scheduler for a cluster that is used by many people concurrently.
I have 2 sets of storm topologies in use today, one is up 24/7, and does it's own work.
The other, is deployed on demand, and handles a much bigger loads of data.
As of today, we have N supervisors instances, all from the same type of hardware (CPU/RAM), I'd like my on demand topology to run on stronger hardware, but as far as I know, there's no way to control which supervisor is assigned to which topology.
So if I can't control it, it's possible that the 24/7 topology would assign one of the stronger workers to itself.
Any ideas, if there is such a way?
Thanks in advance
Yes, you can control which topologies go where. This is the job of the scheduler.
You very likely want either the isolation scheduler or the resource aware scheduler. See https://storm.apache.org/releases/2.0.0-SNAPSHOT/Storm-Scheduler.html and https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.html.
The isolation scheduler lets you prevent Storm from running any other topologies on the machines you use to run the on demand topology. The resource aware scheduler would let you set the resource requirements for the on demand topology, and preferentially assign the strong machines to the on demand topology. See the priority section at https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.html#Topology-Priorities-and-Per-User-Resource.
I just noticed the fact that many Pig jobs on Hadoop are killed due to the following reason: Container preempted by scheduler
Could someone explain me what causes this, and if I should (and am able to) do something about this?
Thanks!
If you have the fair scheduler and a number of different queue's enabled, then higher priority applications can terminate your jobs (in a preemptive fashion).
Hortonworks have a pretty good explanation with more details
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_yarn_resource_mgt/content/preemption.html
Should you do anything about it? Depends if your application is within its SLA's and performing within expectations. General good practice would be to review your job priority and the queue it's assigned to.
If your Hadoop cluster is being used by many business units. then Admins decides queue for them and every queue has its priorities( that too is decided by Admins). If Preemption is enabled at scheduler level,then higher-priority applications do not have to wait because lower priority applications have taken up the available capacity. So in this case lower propriety task must have to release resources, if not available at cluster to let run higher-priority applications.
I setup a Mesos cluster which runs Apache Aurora framework, and i registered 100 cron jobs which run every min on a 5 slave machine pool. I found after scheduled 100 times, the cron jobs stacked in "PENDING" state. May i ask what kind of logs i can inspect and what is the possible problem ?
It could be a couple of things:
Do you still have sufficient resources in your cluster?
Are those resources offered to Aurora? Or maybe only to another framework?
Do you have any task constraints that prevent your tasks from being scheduled?
Possible information source:
What does the tooltip or the expanded status say on the UI? (as shown in the screenshot)
The Aurora scheduler has log files. However normally those are not needed for an end user to figure out why stuff is stuck in pending.
In case you are stuck here, it would probably be the best to drop by in the #aurora IRC channel on freenode.
I am trying to understand how the various components of Mesos work together, and found this excellent tutorial that contains the following architectural overview:
I have a few concerns about this that aren't made clear (either in the article or in the official Mesos docs):
Where are the Schedulers running? Are there "Scheduler nodes" where only the Schedulers should be running?
If I was writing my own Mesos framework, what Scheduler functionality would I need to implement? Is it just a binary yes/no or accept/reject for Offers sent by the Master? Any concrete examples?
If I was writing my own Mesos framework, what Executor functionality would I need to implement? Any concrete examples?
What's a concrete example of a Task that would be sent to an Executor?
Are Executors "pinned" (permanently installed on) Slaves, or do they float around in an "on demand" type fashion, being installed and executed dynamically/on-the-fly?
Great questions!
I believe it would be really helpful to have a look at a sample framework such as Rendler. This will probably answer most of your question and give you feeling for the framework internal.
Let me now try to answer the question which might be still be open after this.
Scheduler Location
Schedulers are not on on any special nodes, but keep in mind that schedulers can failover as well (as any part in a distributed system).
Scheduler functionality
Have a look at Rendler or at the framework development guide.
Executor functionality/Task
I believe Rendler is a good example to understand the Task/Executor relationship. Just start reading the README/description on the main github page.
Executor pinning
Executors are started on each node when the first Task requiring such executor is send to this node. After this it will remain on that node.
Hope this helped!
To add to js84's excellent response,
Scheduler Location: Many users like to launch the schedulers via another framework like Marathon to ensure that if the scheduler or its node dies, then it can be restarted elsewhere.
Scheduler functionality: After registering with Mesos, your scheduler will start getting resource offers in the resourceOffers() callback, in which your scheduler should launch (at least) one task on a subset (or all) of the resources being offered. You'll probably also want to implement the statusUpdate() callback to handle task completion/failure.
Note that you may not even need to implement your own scheduler if an existing framework like Marathon/Chronos/Aurora/Kubernetes could suffice.
Executor functionality: You usually don't need to create a custom executor if you just want to launch a linux process or docker container and know when it completes. You could just use the default mesos-executor (by specifying a CommandInfo directly in TaskInfo, instead of embedded inside an ExecutorInfo). If, however you want to build a custom executor, at minimum you need to implement launchTask(), and ideally also killTask().
Example Task: An example task could be a simple linux command like sleep 1000 or echo "Hello World", or a docker container (via ContainerInfo) like image : 'mysql'. Or, if you use a custom executor, then the executor defines what a task is and how to run it, so a task could instead be run as another thread in the executor's process, or just become an item in a queue in a single-threaded executor.
Executor pinning: The executor is distributed via CommandInfo URIs, just like any task binaries, so they do not need to be preinstalled on the nodes. Mesos will fetch and run it for you.
Schedulers: are some strategy to accept or reject the offer. Schedulers we can write our own or we can use some existing one like chronos. In scheduler we should evaluate the resources available and then either accept or reject.
Scheduler functionality: Example could be like suppose say u have a task which needs 8 cpus to run, but the offer from mesos may be 6 cpus which won't serve the need in this case u can reject.
Executor functionality : Executor handles state related information of your task. Set of APIs you need to implement like what is the status of assigned task in mesos slave. What is the num of cpus currently available in mesos slave where executor is running.
concrete example for executor : chronos
being installed and executed dynamically/on-the-fly : These are not possible, you need to pre configure the executors. However you can replicate the executors using autoscaling.