How can I solve this probably using integer linear programming? - algorithm

We have four events x,y,z,w that could be ran on a machine M. Each of the event need to use 1/3 of the machine’s capacity. Some of them cannot be ran simultaneously in one batch (say x and y cannot be ran in one batch), how to determine the minimum number of batches expected to run? The time of the event does not matter so the objective is the minimum number of batches.
My intuition is I can formulate it as a integer linear programming. Any ideas?

Based on what you described, the minimal number of events that can be run is 0, as you didn't mention any objective function as mentioned by #cricket_007.
Then the maximal number of events is less than 2, as you mentioned at least two events can not be run simultaneously.
At last, the definition of batch is not given, so not sure about that.

Related

interval scheduling algorithm where overlap is allowed

I have stumbled into a problem that looks similar to the classic interval scheduling problem
However, in my case I do allow overlap between intervals- just want to minimize it as much as possible.
Is there a canonic algorithm that solves this? I did not find a 'relaxed' alternative online.
I don't think this problem maps cleanly to any of the classical scheduling literature. I'd try to solve it as an integer program using (e.g.) OR-Tools.
What makes this problem hard is that the order of the jobs is unknown. Otherwise, we could probably write a dynamic program. Handling continuous time efficiently would be tricky but doable.
Similarly, the natural first attempt for a mixed integer programming formulation would be to have a variable for the starting time of each job, but the overlap function is horribly non-convex, and I don't see a good way to encode using integer variables.
The formulation that I would try first would be to quantize time, then create a 0-1 variable x[j,s] for each valid (job, start time) pair. Then we write constraints to force each job to be scheduled exactly once:
for all j, sum over s of x[j,s] = 1.
As for the objective, we have a couple of different choices. I'll show one, but there's a lot of flexibility as long as one unit of time with i + j jobs running is worse than one unit with i jobs and a different unit with j jobs.
For each time t, we make a non-negative integer variable y[t] that will represent the number of excess jobs running at t. We write constraints:
for all t, -y[t] + sum over (j,s) overlapping t of x[j,s] ≤ 1.
Technically this constraint only forces y[t] to be greater than or equal to the number of excess jobs. But the optimal solution will take it to be equal, because of the objective:
minimize sum over t of y[t].

NP-Hardness proof for constrained scheduling with staircase cost

I am working on a problem that appears like a variant of the assignment problem. There are tasks that need to be assigned to servers. The sum of costs over servers needs to be minimized. The following conditions hold:
Each task has a unit size.
A task may not be divided among more than one servers. A task must be handled by exactly one server.
A server has a limit on the maximum number of tasks that may be assigned to it.
The cost function for task assignment is a staircase function. A server incurs a minimum cost 'a'. For each task handled by the server, the cost increases by 1. If the number of tasks assigned to a particular server exceeds half of it's capacity, there is a jump in that server's cost equal to a positive number 'd'.
Tasks have preferences, i.e., a given task may be assigned to one of a few of the servers.
I have a feeling that this is an NP-Hard problem, but I can't seem to find an NP-Complete problem to map to it. I've tried Bin Packing, Assignment problem, Multiple Knapsacks, bipartite graph matching but none of these problems have all the key characteristics of my problem. Can you please suggest some problem that maps to it?
Thanks and best regards
Saqib
Have you tried reducing the set partitioning problem to yours?
The SET-PART (stands for "set partitioning") decision problem asks whether there exists a partition of a given set S of numbers into two sets S1 and S2, so that the sum of the elements in S1 equals the sum of elements in S2. This problem is known to be NP-complete.
Your problem seems related to the m-PROCESSOR decision problem. Given a nonempty set A of n>0 tasks {a1,a2,...,an} with processing times t1,t2,...,tn, the m-PROCESSOR problem asks if you can schedule the tasks among m equal processors so that all tasks finish in at most k>0 time steps. (Processing times are (positive) natural numbers.)
The reduction of SET-PART to m-PROCESSOR is very easy: first show that the special case, with m=2, is NP-complete; then use this to show that m-PROCESSOR is NP-complete for all m>=2. (A reduction in Slovene.)
Hope this helps.
EDIT 1: Oops, this m-PROCESSOR thingy seems very similar to the assignment problem.

Parallel Normal Distributions

I'm working on a simulation where a large task is completed by a series of independent smaller tasks either in parallel or in series. The smaller task's time of completion follows a normal distribution with a mean time say "t" and a variance say "v". I understand that if this task is repeated in series say "n" times than the new total time distribution is normal with mean t*n and variance v*n, which is nice but I don't know what happens to the mean and variance if a set of the same tasks are done simultaneously/in parallel, it's been a while since prob stat class. Is there a nice/fast way to find the new time distribution for "n" of these independent normally distributed task done in parallel?
If the tasks are undertaken independently and in parallel, the distribution of time until completion depends on the time of the longest process.
Unfortunately, the max function doesn't have particularly nice properties for theoretical analysis, but if you're already simulating there's an easy way to do it. For each subprocess i with mean t_i and variance v_i, draw time until completion for each i independently then look at the biggest. Repeating this lots of times will give you a bunch of samples from the max distribution you're interested in: you can compute the expectation (average), variance, or whatever you want.
The question is, what is the distribution of the maximum (greatest value) of the random completion times. The distribution function (i.e. the indefinite integral of the probability density) of the maximum of a collection of independent random variables is just the product of the distribution function of each variable. (The distribution function of the minimum is just 1 - (product of (1 - distribution function)).)
If you want to find a time such that probability(maximum > time) = (some given value), you might be able to solve that exactly, or resort to a numerical method. Still, solving the equation numerically (e.g. bisection method) is much faster and more accurate than a Monte Carlo method, as you mentioned you have already tried.
This isn't exactly a programming problem, but what you're looking for are the distributions of order statistics of normal random variables, i.e., the expected value/variance/etc of the job that took the longest, shortest, etc. This is a solved problem for identical means and variances, because you can scale all the random variables to the standard normal distribution, which has been analyzed.
Here's the paper that gives you the answer, though you're going to need some math knowledge to understand it:
Algorithm AS 177: Expected Normal Order Statistics (Exact and Approximate) J. P. Royston. Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 31, No. 2 (1982), pp. 161-165
See this post on stats.stackexchange for more information.

Is the busy beaver function unique for an n-state busy beaver game?

For a given n-state busy beaver game, is the busy beaver function unique, or might there be multiple functions with the same maximum score? Perhaps it has not been proven either way?
Yes, it is.
The busy beaver function is defined so that
\Sigma(n) = max { \sigma(M) | M is a halting n-state 2-symbol Turing machine}
The maximum is unique if it exists, which it does (Rado proved this). This is just a number.
Therefore \Sigma(n) is also unique, and so the discrete function \Sigma: N --> N is also unique. There may be multiple ways to extend \Sigma to a continuous function, but why someone would want to do this is beyond me.
It's possible to compute small values of \Sigma; check out the OEIS entry for the largest known values.
As #PengOne pointed out, the function is indeed unique. It is a completely defined N -> N discrete function.
However, from your formulation ("or might there be multiple functions with the same maximum score") it can also be understood that you want to know whether there are multiple busy-beavers that give the same maximum. If that is the case, then yes, there are at least 2 busy-beavers given an N, one is constructed from the other by simply reversing the shifts.
This has been asked a long time ago, but I found this interesting: http://www.win.tue.nl/~wijers/shallit.pdf
Also, I coded an algorithm that brute forces the 3-state busy-beaver problem, and it gave me about 22 non-symetrical configurations that produced 6 symbols (consecutive or not). This means there are perhaps 60-some configurations if you consider you can swap state 1 and state 2, as well as inverse the first transition.
But that's only for the amount of symbols produced, not the 'longest execution' one.

Prescheduling Recurrent Tasks

At work, we are given a set of constraints of the form (taskname, frequency) where frequency is an integer number which means the number of ticks between each invocation of the task "taskname". Two tasks cannot run concurrently, and each task invocation takes one tick to complete. Our goal is to find the best schedule in terms of matching the set of constraints.
For example, if we are given the constraints {(a, 2), (b,2)} the best schedule is "ab ab ab..."
On the other hand, if we are given the constraints ({a,2}, {b, 5}, {c, 5}) the best schedule is probably "abaca abaca abaca..."
Currently we find the best schedule by running a genetic algorithm which tries to minimize the distance between actual frequencies and the given constrains. It actually works pretty well, but I wonder if there's some algorithm which better suits this kind of problem. I've tried to search Google but I seem to lack the right words (scheduling is usually about completing tasks :(). Can you help?
First off, consider the merits of jldupont's comment! :)
Second, I think 'period' is the accurate description of the second element of the tuple, e.g. {Name, Period[icity]}.
That said, look to networking algorithms. Some variant of weighted queuing is probably applicable here.
For example, given N tasks, create N queues corresponding to tasks T0...Tn, and in each cycle ("tick") based on the period of the task, queue an item to the corresponding queue.
The scheduler algorithm would then aim for minimizing (on average) the total number of waiters in the queues. Simple starting off point would be to simply dequeue from the quene Qx which has the highest number of items. (A parameter on queued item to indicate 'age' would assist in prioritization.)

Resources