Optaplanner, earliest job start time - job-scheduling

Problem: I am trying to schedule multiple SKUs across 3-4 sequential activities for optimal scheduling under constrained resources.
Tool: Optaplanner job scheduler
Issue: Each activity has a predecessor (which is not being modelled) and can only start after the predecessor is complete. So each job can start only after a pre-defined X hours.
Question: How to provide this X-hours as an input to the job scheduler?

Related

Is there any way to know which job will start next in qsub

In our institute (IISc Bangalore)Supercomputer ,we submit jobs using qsub. The jobs will start running according to the following-
(1) Its wall time(Expected completion time)
(2) Its position in the respected queue(small,medium,large etc).
So,it is very difficult to know which job will start after finishing one job which is currently running. But qsub is probably has a list of its own,by which it is starting a new job after finishing another job immediately.
Is there any way to know which job will start next.Is there any command for this.
Thank you.
Unfortunately, there is no clear way to know which job will be run next in a supercomputing system. The job start is depending not only on it's wall time or position in the queue but also many other factors based on the site-level policy, scheduling strategies and priorities. There can be some internal job ranking (priorities) chosen by the institute based on factors like power management, load balancing etc.
On the other side, there are many researches to predict the waiting time for job allocation. TeraGrid systems provides estimated waiting time. Also, see link1, link2 (by SERC) for more information about predicting the waiting time.

weighted job scheduling with more than one intervals for each job

I want to schedule jobs with more than one interval for each job. I test some algorithms like MWIS(maximum-weighted-independent-set) but didn't work properly(this pdf).
How can I do with this problem?
and can we change the dynamic programing weighted job scheduling for one interval to multiple interval?

What is the best job scheduler policy to prioritize small HPC jobs for weak-scaling tests?

I am interested in performing weak scaling tests on an HPC cluster. In order to achieve this, I run several small tests on 1,2,4,8,16,32,64 nodes with each simulation taking less than a minute to maximum 1 hour. However, the jobs stay in queue (1 hour queue) for several days before the test results are available.
I have two questions:
Is there a way to prioritize the jobs in the job scheduler given that most tests are less than a minute for which I have to wait several days?
Can and to what extent such a job scheduling policy invite abuse of HPC resources. Consider a hypothetical example of an HPC simulation on 32 nodes, which is divided into several small 1 hour simulations that get prioritized because of the solution provided in point 1. above?
Note: the job scheduling and management system used at the HPC center is MOAB. Each cluster node is equipped with 2 Xeon 6140 CPUs#2.3 GHz (Skylake), 18 cores each.
Moab's fairshare scheduler may do what you want, or if it doesn't out of the box, may allow tweaking to prioritize jobs within the range you're interested in: http://docs.adaptivecomputing.com/mwm/7-1-3/help.htm#topics/fairness/6.3fairshare.html.

Max-Min and Min-Min algorithms implementation

I'm trying to simulate the Max-Min and Min-Min scheduling algorithms and code them myself in a simulation. But don't really understand how to implement the way they work in code.
For example, in FCFS algorithm i use 3 servers(vms),each server is faster than the other and 5 tasks with different arrival times. So the first task will check the first server and will be scheduled there, the second if arrive while the first is't completed yet, will check the availability and scheduled to the second server. If all 3 servers are occupied the next task will be scheduled to the one with the min remain executing time.
Now for the Min-Min and Min-Max this is the theoretical background:
Min-Min:
Phase 1: First computes the completion time of every task on each machine and then for every task select the machine which processes the tasks in minimum possible time.
Phase 2: Among all the tasks in Meta task the task with minimum completion time is selected and is assigned to machine on which minimum execution time is expected. The task is removed from the list of Meta Task and the procedure continues until Meta Task list is empty.
Max-Min:
Phase 1: First computes the completion time of every task on each machine and then for every task chooses the machine which processes the tasks in minimum possible time
Phase 2: Among all the tasks in Meta Task the task with maximum completion time is selected and is assigned to machine. The task is removed from the list of Meta Task and the procedure continues until Meta Task list is empty.
I get the phase 1 for both algorithms, I need to check the task's burst time and server's speedup -> burst/speedup = execution time. I will find the best server for each task.
But I can't understand the phase 2. For Min-Min i have to choose every time the fastest task and when i find it I will have to schedule it to the faster server. But the workload will be imbalanced, as I said 3 servers and at least one is the faster, lets say server with ID 1, so every time the tasks will scheduled to this one, I also need the other 2 to work.
Same problem with Max-Min, find the worst task, schedule it to the worst server, but only one server is the worst so the other 2 will not work. How am I suppose to do the balancing and also take in consideration that the tasks arrive in different times?
If you need anything more just let me know and thanks in advance!
You can find nice description of both algorithms in A Comparative Analysis of Min-Min and Max-Min Algorithms based on the Makespan Parameter:
I paste here pseudocode for Min-Min. ETij is execution time for task ti on resource Rj. And rj is ready time of Rj.
That's true that you can have imbalanced load, because all small task will get executed first. Max-Min algorithm overcomes this drawback.
Max-Min algorithm performs the same steps as the Min-Min algorithm but the main difference comes in the second phase, where a task ti is selected which has the maximum completion time instead of minimum completion time as in min-min and assigned to resource Rj, which gives the minimum completion time.

Job scheduling algorithm for cluster

I'm searching for algorithm suitable for problem below:
There are multiple computers(exact number is unknown). Each computer pulls job from some central queue, completes job, then pulls next one. Jobs are produced by some group of users. Some users submit lots of jobs, some a little. Jobs consume equal CPU time(not really, just approximation).
Central queue should be fair when scheduling jobs. Also, users who submitted lots of jobs should have some minimal share of resources.
I'm searching a good algorithm for this scheduling.
Considered two candidates:
Hadoop-like fair scheduler. The problem here is: where can I take minimal shares here when my cluster size is unknown?
Associate some penalty with each user. Increment penalty when user's job is scheduled. Use probability of scheduling job to user as 1 - (normalized penalty). This is something like stride scheduling, but I could not find any good explanation on it.
when I implemented a very similar job runner (for a production system), I ended having each server up choose jobtypes at random. This was my reasoning --
a glut of jobs from one user should not impact the chance of other users having their jobs run (user-user fairness)
a glut of one jobtype should not impact the chance of other jobtypes being run (user-job and job-job fairness)
if there is only one jobtype from one user waiting to run, all servers should be running those jobs (no wasted capacity)
the system should run the jobs "fairly", i.e. proportionate to the number of waiting users and jobtypes and not the total waiting jobs (a large volume of one jobtype should not cause scheduling to favor it) (jobtype fairness)
the number of servers can vary, and is not known beforehand
the waiting jobs, jobtypes and users metadata is known to the scheduler, but not the job data (ie, the usernames, jobnames and counts, but not the payloads)
I also wanted each server to be standalone, to schedule its own work autonomously without having to know about the other servers
The solution I settled on was to track the waiting jobs by their {user,jobtype} attribute tuple, and have each scheduling step randomly select 5 tuples and from each tuple up to 10 jobs to run next. The selected jobs were shortlisted to be run by the next available runner. Whenever capacity freed up to run more jobs (either because jobs finished or because of secondary restrictions they could not run), ran another scheduling step to fetch more work.
Jobs were locked atomically as part of being fetched; the locks prevented them from being fetched again or participating in further scheduling decisions. If they failed to run they were unlocked, effectively returning them to the pool. The locks timed out, so the server running them was responsible for keeping the locks refreshed (if a server crashed, the others would time out its locks and would pick up and run the jobs it started but didn't complete)
For my use case I wanted users A and B with jobs A.1, A.2, A.3 and B.1 to each get 25% of the resources (even though that means user A was getting 75% to user B's 25%). Choosing randomly between the four tuples probabilistically converges to that 25%.
If you want users A and B to each have a 50-50 split of resources, and have A's A.1, A.2 and A.3 get an equal share to B's B.1, you can run a two-level scheduler, and randomly choose users and from those users choose jobs. That will distribute the resources among users equally, and within each user's jobs equally among the jobtypes.
A huge number of jobs of a particular jobtype will take a long time to all complete, but that's always going to be the case. By picking from across users then jobtypes the responsiveness of the job processing will not be adversely impacted.
There are lots of secondary restrictions that can be added (e.g., no more than 5 calls per second to linkedin), but the above is the heart of the system.
You could try Torque resource management and Maui batch job scheduling software from Adaptive Computing. Maui policies are flexible enough to fit your needs. It supports backfill, configurable job and user priorities and resource reservations.

Resources