Count number of killed processes - sorting

we have total n processes in a computer and a new process denoted by x is waiting in a queue.
we also have given the memory of all the n processes.
now the task is to find the minimum number of processes killed to be replaced by a new process.
suppose n=5
(memory size for new process)x=9
and memory occupied all 5 processes=2 1 3 4 5.
now if we remove 4 and 5 then the minimum count is 2 (4+5=9).
i have tried this in o(n^2) but i want optimize solution.
please suggest.

Well, if we are allowed to manipulate the process list that was initially given to us, you can simply sort the list of processes in descending order of magnitude. Then, the problem boils down to just iterating over the list and summing the memory allocation of each process you encounter. When you exceed x (i.e. the allocation required for the new process) you just return the current index + 1, which is the number of processes you have encountered (and killed) so far.
Below is a sample Python code implementing it.
def solve(processes, x):
processes.sort(reverse=True)
sum_of_freed_memory = 0
for i in range(len(processes)):
sum_of_freed_memory += processes[i]
if sum_of_freed_memory >= x:
return (i+1)
return -1 # to not crash if/when total memory is smaller than x
You may observe this code at work here, producing the output you desire.
Note that even if we could not manipulate the original list given as input, if we are allowed to use O(n) space in our solution, we may copy the initial list and use the same algorithm on that copy. In terms of code, replacing the first line of the function solve() with the line given below would do it.
processes = sorted(processes, reverse=True)
The approach this code takes is greedy. At each step, it reasons in the following manner: if I have to kill one more process, then among the ones still running, let's kill the one using the largest memory, so that the chances of allocating sufficient memory are the highest possible. In other words, if there is a way of allocating enough space by killing one more process, killing the one using largest memory will do it, whereas killing some other process may be insufficient. Although it is not that formal a proof, I believe this reasoning explains why the algorithm works.
The complexity of sorting is O(NlogN) and the traversal is O(N). Thus, the overall complexity of the algorithm would be O(NlogN).

Related

How to get N greatest elements out of M elements using CUDA, where N << M?

I am just wondering whether there is any efficient ways of getting N greatest elements out of M elements, where N is much smaller than M (e.g. N = 10, and M = 1000) using the GPU.
The problem is that - due to the large size of input data, I really do not want to transfer the data from the GPU to the CPU and then get it back. However, exact sorting does not seem to work well because of thread divergence and the time wasted on sorting elements that we do not really care about (in the case above, DC elements are 11 ~ 1000).
If N is small enough that the N largest values can be kept in shared memory, that would allow a fast implementation that only reads through your array of M elements in global memory once and then immediately writes out these N largest values. Implementation becomes simpler if N also doesn't exceed the maximum number of threads per block.
Contrary to serial programming, I would not use a heap (or other more complicated data structure), but just a sorted array. There is plenty of parallel hardware on an SM that would go unused when traversing a heap. The entire thread block can be used to shift the elements of the shared memory array that are smaller than the newly incoming value.
If N<=32, a neat solution is possible that keeps a sorted list of the N largest numbers in registers, using warp shuffle functions.

Resource allocation algorithm

I know the algorithm exists but i and having problems naming it and finding a suitable solutions.
My problem is as follows:
I have a set of J jobs that need to be completed.
All jobs take different times to complete, but the time is known.
I have a set of R resources.
Each recourse R may have any number from 1 to 100 instances.
A Job may need to use any number of resources R.
A job may need to use multiple instances of a resource R but never more than the resource R has instances. (if a resource only has 2 instances a job will never need more than 2 instances)
Once a job completes it returns all instances of all resources it used back into the pool for other jobs to use.
A job cannot be preempted once started.
As long as resources allow, there is no limit to the number of jobs that can simultaneously execute.
This is not a directed graph problem, the jobs J may execute in any order as long as they can claim their resources.
My Goal:
The most optimal way to schedule the jobs to minimize run time and/or maximize resource utilization.
I'm not sure how good this idea is, but you could model this as an integer linear program, as follows (not tested)
Define some constants,
Use[j,i] = amount of resource i used by job j
Time[j] = length of job j
Capacity[i] = amount of resource i available
Define some variables,
x[j,t] = job j starts at time t
r[i,t] = amount of resource of type i used at time t
slot[t] = is time slot t used
The constraints are,
// every job must start exactly once
(1). for every j, sum[t](x[j,t]) = 1
// a resource can only be used up to its capacity
(2). r[i,t] <= Capacity[i]
// if a job is running, it uses resources
(3). r[i,t] = sum[j | s <= t && s + Time[j] >= t] (x[j,s] * Use[j,i])
// if a job is running, then the time slot is used
(4). slot[t] >= x[j,s] iff s <= t && s + Time[j] >= t
The third constraint means that if a job was started recently enough that it's still running, then its resource usage is added to the currently used resources. The fourth constraint means that if a job was started recently enough that it's still running, then this time slot is used.
The objective function is the weighted sum of slots, with higher weights for later slots, so that it prefers to fill the early slots. In theory the weights must increase exponentially to ensure using a later time slot is always worse than any configuration that uses only earlier time slots, but solvers don't like that and in practice you can probably get away with using slower growing weights.
You will need enough slots such that a solution exists, but preferably not too many more than you end up needing, so I suggest you start with a greedy solution to give you a hopefully non-trivial upper bound on the number of time slots (obviously there is also the sum of the lengths of all tasks).
There are many ways to get a greedy solution, for example just schedule the jobs one by one in the earliest time slot it will go. It may work better to order them by some measure of "hardness" and put the hard ones in first, for example you could give them a score based on how badly they use a resource up (say, the sum of Use[j,i] / Capacity[i], or maybe the maximum? who knows, try some things) and then order by that score in decreasing order.
As a bonus, you may not always have to solve the full ILP problem (which is NP-hard, so sometimes it can take a while), if you solve just the linear relaxation (allowing the variables to take fractional values, not just 0 or 1) you get a lower bound, and the approximate greedy solutions give upper bounds. If they are sufficiently close, you can skip the costly integer phase and take a greedy solution. In some cases this can even prove the greedy solution optimal, if the rounded-up objective from the linear relaxation is the same as the objective of the greedy solution.
This might be a job for Dykstra's Algorithm. For your case, if you want to maximize resource utilization, then each node in the search space is the result of adding a job to the list of jobs you'll do at once. The edges will then be the resources which are left when you add a job to the list of jobs you'll do.
The goal then, is to find the path to the node which has an incoming edge which is the smallest value.
An alternative, which is more straight forward, is to view this as a knapsack problem.
To construct this problem as an instance of The Knapsack Problem, I'd do the following:
Assuming I have J jobs, j_1, j_2, ..., j_n and R resources, I want to find the subset of J such that when that subset is scheduled, R is minimized (I'll call that J').
in pseudo-code:
def knapsack(J, R, J`):
potential_solutions = []
for j in J:
if R > resources_used_by(j):
potential_solutions.push( knapsack(J - j, R - resources_used_by(j), J' + j) )
else:
return J', R
return best_solution_of(potential_solutions)

Is there an exact algorithm for the minimum makespan scheduling with 2 identical machines and N processes that exists for small constraints?

If 2 identical machines are given, with N jobs with i'th job taking T[i] time to complete, is there an exact algorithm to assign these N jobs to the 2 machines so that the makespan is minimum or the total time required to complete all the N jobs is minimum?
I need to solve the problem only for N=50.
Also note that total execution time of all the processes is bounded by 10000.
Does greedily allocating the largest job to the machine which gets free work?
// s1 -> machine 1
//s2->machine 2 , a[i]-> job[i] ,time-> time consumed,jobs sorted in descending order
// allocated one by one to the machine which is free.
long long ans=INT_MAX;
sort(a,a+n);
reverse(a,a+n);
int i=2;
int s1=a[0];
int s2=a[1];
long long time=min(s1,s2);
s1-=time;
s2-=time;
while(i<n)
{
if(s1==0 && s2==0)
{
s1=a[i];
if(i+1<n) s2=a[i+1];
int c=min(s1,s2);
time+=c;
s1-=c;
s2-=c;
i+=2;
continue;
}
else
{
if(s1<s2) swap(s1,s2);
s2=a[i];
int c=min(s1,s2);
time+=c;
s1-=c;
s2-=c;
i++;
}
}
assert(s1*s2==0);
ans = min(ans,time+max(s1,s2));
The problem you described is NP-hard via a more or less straightforward reduction from Subset Sum, which makes an excat polynomial time algorithm impossible unless P=NP. Greedy assignment will not yield an optimal solution in general. However, as the number of jobs is bounded by 50, any exact algorithm with running time exponential in N is in fact an algorithm with constant running time.
The problem can be tackled via dynamic programming as follows. Let P be the sum of all processing times, which is an upper bound for the optimal makespan. Define an array S[N][P] as state space. The meaning of S[i][j] is the minimum makespan attainable for jobs indexed by 1,...,i where the load of machine 1 is exactly j. An outer loop iterates over the jobs, an inner loop over the target load of machine 1. In each iteration, we have do decide whether job i should run on machine 1 or machine 2. The determination of the state value of course has to be done in such a way that only solutions which exist are taken into account.
In the first case, we set S[i][j] to the minimum of [i-1][j-T[i]] + T[i] (the resulting load of machine 1) and the sum of pi' for i' in {1,...,i-1} minus [i-1][j-T[i]] (the resulting load of machine 2, so to speak the complementary load of machine 1 which is not changed by our choice).
In the second case, we set S[i][j] to the minimum of [i-1][j] (the resulting load of machine 1 which is not changed by our choice) and the sum of T[i'] for i' in {1,...,i-1} minus [i-1][j-T[i]] plus T[i] (the resulting load of machine 2, so to speak the complementary load of machine 1).
Finally, the optimal makespan can be found by determining the minimum value of S[N][j] for each j. Note that the approach only calculates the optimum value, but not an optimal solution itself. An optimal solution can be found by backtracking or using suitable auxiliary data structures. The running time and space requirement would be O(N*P), i.e. pseudopolynomial in N.
Note that the problem and the approach are very similar to the Knapsack problem. However for the scheduling problem, the choice is not to be made whether or not to include an item but whether or not to execute a job on machine 1 or machine 2.
Also note that the problem is actually well-studied; the problem description in so-called three-field notation is P2||Cmax. If I recall correctly, however greedily scheduling jobs in non-increasing order of processing time yields an approximation ratio of 2 as proved in the following article.
R.L. Graham, "Bounds-for certain multiprocessing anomalies," Bell System Technological Journal 45 (1966) 1563-1581

How do I minimize the amount of memory needed for a fixed allocation scheme?

The following image visualizes the needed life span for 16 memory blocks of various sizes:
What I'm essentially looking for is given a number of N blocks of size sizei and lifetime [begini, endi), return the minimum sized total memory block needed to contain them during our total time interval and N offsets, offseti, into this total memory block for the input blocks.
A trivial non optimal algorithm would be the following:
int offsets[N];
offsets[0] = 0;
int total_size = size[0];
for (int i = 1; i < N; ++i)
{
offsets[i] = offsets[i - 1] + size[i - 1];
total_size += size[i];
}
Our current algorithm is to sort the blocks by size and then process them from largest to smallest, finding the first offset where the block doesn't overlap with an already "allocated" block. This is essentially a greedy algorithm, so I have a feeling that it would be possible to do better.
The algorithm only needs to be run once at the start of the application so it doesn't have to be superfast. The number of allocations is in the order of 10-50 and for our purposes the time can be discretized into around 50 fixed size units.
Find the minimum start time and maximum end time from your list of begin and end time intervals. This is the total interval, (t_min, t_max), of time you are interested in. Next, divide the time interval into some discrete and uniform intervals. Let the length of this interval be u. This is basically the maximum resolution of your memory management (how often you can possibly free and/or claim a block of memory).
For each unit of time, determine which Allocation IDs need memory and what size they each require at that time, call it s(t, id). The sum of s(t, id) over all allocation IDs is a lower bound of how much total memory you require at any given time. You can't do any better than the maximum of this function, which fails to take into account the desire to keep things allocated in the same region without moving them at each time step.
To find an optimal position for each item you could use a heuristic search. Basically, search the state space of all possible starting addresses for each memory block for the solution that takes up the smallest total amount of memory, which you find by simulating the progression of time from t_min to t_max.
A heuristic that might be worth trying is to prefer allocations where big chunks occupy spaces previously occupied by other big chunks, and small chunks are placed at locations with small contribution to the maximum memory usage of the strategy. You could also prune any strategy found that is worse than the best seen so far since the maximum memory claimed by the strategy is monotonic over time.
The heuristic search method may be slow, but it sounds like you care more about optimal memory usage than runtime of the allocation algorithm.

Floyd loop detection algorithm with different step size

In the Floyd loop detection algorithm in linked list,we generally increment slow pointer by 1 unit and fast pointer by 2 unit. What are the other values that we can use for incrementing the slow and fast pointer and how do they change the complexity of algorithm ?
The two pointers will always meet, regardless of speeds or loop size.
Using the following values:
a and b: The number of steps taken by each pointer for each iteration.
m: The number of nodes in the loop.
After i iterations, the two pointers will have taken ai and bi steps. They will be at the same node if i is large enough that both pointers are inside the loop, and:
ai = bi (mod m)
which is the same as:
(a-b)i = 0 (mod m)
This will be true for a value of i which is a multiple of m, and is large enough. Such a value will always exist so the pointers will always meet.
Larger values of a and b will increase the number of steps taken per iteration, but if they are both constants then the complexity will still be linear.
I think the step size does not matter. As long as slow < fast the two would meet if there is a cycle in the list.
The only difference would be that in each iteration the number of steps taken by each pointer would vary.
well i understood it in an argumentative way with use of some basics maths. imagine a linked list with a loop,both the slow pointer and the fast pointer starts moving.
Let T be the point where the loop starts or the node where the list connects itself.
when the slow pointer reaches this node the fast pointer would now be inside the loop. so hence now imagine this loop like clock having an hour hand and a minute hand , the two pointers will meet irrespective of their speed on the common multiples of their speeds.

Resources