Booking System is NP Complete - algorithm

I have to show that the following problem is NP-Complete and need some helpful hints on how to proceed.
The problem:
We're looking at a meeting booking system. The input is a list n of possible times as well as m lists (where m <= n), one list per person containing their choice of possible meeting times. For each possible time, a priority number is also given. For each reservation time in the list of n, a cost is also given. (Cost of booking the room). The algorithm should assign times so that the combined priority for those who have booked should be as small as possible while the total cost of booking should not be higher than M.
NP
So first to show that it's in NP we should show that given a correct solution it can be verified that it is indeed correct. I guess it should verify that that the cost is below the threshold of K and that the priority of the correct solution is indeed the minimum - both of which can be done in Polynomial time I assume. We traverse through the lists of people, assert that each one has a time granted to them, add up the cost in a variable and at the end of this list assert that the cost is below K. The priority can be dealt with in similar fashion I suppose?
NP Hard
Then to show it's NP Hard I can use the Knapsack Problem since they're rather similar. With input S, size of bag, a list of items with weight w and value v as well as the goal W which is the goal-value. I guess it's clear that S can correlate to cost and that W correlate to the priority? So we want S, the size, to be below S i.e we have the similar condition for the problem above where the cost has to be below K. Then W, the total value should generally exceed W, but in our case we want it to be as low as possible which seems doable.
I'm afraid I might've gone the wrong way when it comes to verifying the problem. Also the reduction to show it's NP Hard is perhaps not thought out all the way. Some pointers would be very helpful! Thanks

NP
When you are proving the problem is in NP, you must first turn your problem into a decision problem. Then you can verify your certificate in polynomial time as you started to describe.
NP Hard
You need to transform the Knapsack problem into your meeting problem. You are going the right way because you are transforming size and weight from Knapsack into the meeting problem. Once you figure out the transformation, you must verify that it can be done in polynomial time. Finally, you can show that the solution to Knapsack is a solution to meeting problem and vice versa.

Related

Dual knapsack algorithm

Say you have a warehouse with fragile goods (f.e. vegetables or fruits), and you can only take out a container with vegetables once. If you move them twice, they'll rot too fast and cant be sold anymore.
So if you give a value to every container of vegetables (depending on how long they'll still be fresh), you want to sell the lowest value first. And when a client asks a certain weight, you want to deliver a good service, and give the exact weight (so you need to take some extra out of your warehouse, and throw the extra bit away after selling).
I don't know if this problem has a name, but I would consider this the dual form of the knapsack problem. In the knapsack problem, you want to maximise the value and limit the weight to a maximum. While here you want to minimise the value and limit the weight to a minimum.
You can easily see this duality by treating the warehouse as the knapsack, and optimising the warehouse for the maximum value and limited weight to a maximum of the current weight minus what the client asks.
However, many practical algorithms on solving the knapsack problem rely on the assumption that the weight you can carry is small compared to the total weight you can chose from. F.e. the dynamic programming 0/1 solution relies on looping until you reach the maximum weight, and the FPTAS solution guarantees to be correct within a factor of (1-e) of the total weight (but a small factor of a huge value can still make a pretty big difference).
So both have issues when the wanted weight is big.
As such, I wondered if anyone studied the "dual knapsack problem" already (if some literature can be found around it), or if there's some easy modification to the existing algorithms that I'm missing.
The usual pseudopolynomial DP algorithm for solving knapsack asks, for each i and w, "What is the largest total value I can get from the first i items if I use at most w capacity?"
You can instead ask, for each i and w, "What is the smallest total value I can get from the first i items if I use at least w capacity?" The logic is almost identical, except that the direction of the comparison is reversed, and you need a special value to record the possibility that even taking all i of the first i items cannot reach w capacity -- infinity works for this, since you want this value to lose against any finite value when they are compared with min().

Utility Maximizing Assignment

I posted this on computer science section but no one replied :(. Any help would be greatly appreciated :).
There is a grid of size MxN. M~20000 and N~10. So M is very huge. So one way is to look at this is N grid blocks of size M placed side by side. Next assume that there are K number of users who each have a utility matrix of MxN, where each element provides the utility that the user will obtain if that user is assigned that grid element. The allocation needs to be done in a way such for each assigned user total utility must exceed a certain threshold utility U in every grid block. Assume only one user can be assigned one grid element. What is the maximum number of users that can be assigned?. (So its okay if some users are not assigned ).
Level 2: Now assume for each user at least n out N blocks must exceed utility threshold U. For this problem, whats the maximum number of users that can be assigned.
Of course brute force search is of no use here due to K^(MN) complexity. I am guessing that some kind of dynamic programming approach maybe possible.
To my understanding, the problem can be modelled as a Maximum Bipartite Matching problem, which can be solved efficiently with the Hungarian algorithm. In the left partition L, create K nodes, one for each user. In the right partition R, create L*M*N nodes, one for each cell in the grid. As edges create edges for each l in L and r in R with cost equal to the cost of the assignment of user l to the grid cell r.
Using a different interpretation of your question than Codor, I am going to claim that (at least in theory, in the worst case) it is a hard problem.
Suppose that we can solve it in the special case when there is one block which must be shared between two users, who each have the same utility for each cell, and the threshold utility U is (half of the total utility for all the cells in the block), minus one.
This means that in order to solve the problem we must take a list of numbers and divide them up into two sets such that the sum of the numbers in each set is the same, and is exactly half of the total sum of the numbers available.
This is http://en.wikipedia.org/wiki/Partition_problem, which is NP complete, so if you could solve your problem as I have described it you could solve a problem known to be hard.
(However the Wikipedia entry does say that this is known as "the easiest hard problem" so even if I have described it correctly, there may be solutions that work well in practice).

Suggestions for fragment proposal algorithm

I'm currently trying to solve the following problem, but am unsure which algorithm I should be using. Its in the area of mass identification.
I have a series of "weights", *w_i*, which can sum up to a total weight. The as-measured total weight has an error associated with it, so is thus inexact.
I need to find, given the total weight T, the closest k possible combinations of weights that can sum up to the total, where k is an input from the user. Each weight can be used multiple times.
Now, this sounds suspiciously like the bounded-integer multiple knapsack problem, however
it is possible to go over the weight, and
I also want all of the ranked solutions in terms of error
I can probably solve it using multiple sweeps of the knapsack problem, from weight-error->weight+error, by stepping in small enough increments, however it is possible if the increment is too large to miss certain weight combinations that could be used.
The number of weights is usually small (4 ->10 weights) and the ratio of the total weight to the mean weight is usually around 2 or 3
Does anyone know the names of an algorithm that might be suitable here?
Your problem effectively resembles the knapsack problem which is a NP-complete problem.
For really limited number of weights, you could run over every combinations with repetition followed by a sorting which gives you a quite high number of manipulations; at best: (n + k - 1)! / ((n - 1)! · k!) for the combination and n·log(n) for the sorting part.
Solving this kind of problem in a reasonable amount of time is best done by evolutionary algorithms nowadays.
If you take the following example from deap, an evolutionary algorithm framework in Python:
ga_knapsack.py, you realise that by modifying lines 58-59 that automatically discards an overweight solution for something smoother (a linear relation, for instance), it will give you solutions close to the optimal one in a shorter time than brute force. Solutions are already sorted for you at the end, as you requested.
As a first attempt I'd go for constraint programming (but then I almost always do, so take the suggestion with a pinch of salt):
Given W=w_1, ..., w_i for weights and E=e_1,.., e_i for the error (you can also make it asymmetric), and T.
Find all sets S (if the weights are unique, or a list) st sum w_1+e_1,..., w_k+e_k (where w_1, .., w_k \elem and e_1, ..., e_k \elem E) \approx T within some delta which you derive from k. Or just set it to some reasonably large value and decrease it as you are solving the constraints.
I just realise that you also want to parametrise the expression w_n op e_m over op \elem +, - (any combination of weights and error terms) and off the top of my head I don't know which constraint solver would allow you to do that. In any case, you can always fall back to prolog. It may not fly, especially if you have a lot of weights, but it will give you solutions quickly.

Group incoming and outgoing invoices to make their sum 0

I've faced an interesting problem today, and decided to write an algorithm in C# to solve it.
There are incoming invoices with negative totals and outgoing invoices with positive totals. The task is to make groups out of these invoices, where the total of the invoices adds up to exactly 0. Each group can contain unlimited members, so if there are two positive and one negative members but they total value is 0, it's okay.
We try to minimize the sum of the remaining invoices' totals, and there are no other constraints at all.
I'm wondering if this problem could be traced back to a known problem, and if not, which would be the most effective way to do this. The naive approach would be to separate incoming and outgoing invoices into two different groups, sort by total, then to try add invoices one by one until zero is reached or the sign has changed. However, this presumes that the invoices in a group should be approximately of the same magnitude, which is not true (one huge incoming invoice could be put against 10 smaller outgoing ones)
Any ideas?
The problem you are facing is a well known and studied one, and is called The Subset Sum Problem.
Unfortunately, the problem is NP-Complete, so there is no known polynomial solution for it1.
In fact, there is no known polynomial solution to even determine if such a subset (even a single one) exists, let alone find it.
However, if your input consists of relatively small (absolute value) integers, there is a pretty efficient (pseudo polynomial) dynamic programming solution that can be utilized to solve the problem.
If this is not the case some other alternatives are:
Using exponential solution like brute force (you might be able to optimize it using branch and bound technique)
Heuristical solutions, such as Steepest Ascent Hill Climbing or Genethic Algorithms.
Approximation algorithms
(1) And most computer science researchers believe one does not exist, this is basically the P VS NP Problem.

What's the most insidious way to pose this problem?

My best shot so far:
A delivery vehicle needs to make a series of deliveries (d1,d2,...dn), and can do so in any order--in other words, all the possible permutations of the set D = {d1,d2,...dn} are valid solutions--but the particular solution needs to be determined before it leaves the base station at one end of the route (imagine that the packages need to be loaded in the vehicle LIFO, for example).
Further, the cost of the various permutations is not the same. It can be computed as the sum of the squares of distance traveled between di -1 and di, where d0 is taken to be the base station, with the caveat that any segment that involves a change of direction costs 3 times as much (imagine this is going on on a railroad or a pneumatic tube, and backing up disrupts other traffic).
Given the set of deliveries D represented as their distance from the base station (so abs(di-dj) is the distance between two deliveries) and an iterator permutations(D) which will produce each permutation in succession, find a permutation which has a cost less than or equal to that of any other permutation.
Now, a direct implementation from this description might lead to code like this:
function Cost(D) ...
function Best_order(D)
for D1 in permutations(D)
Found = true
for D2 in permutations(D)
Found = false if cost(D2) > cost(D1)
return D1 if Found
Which is O(n*n!^2), e.g. pretty awful--especially compared to the O(n log(n)) someone with insight would find, by simply sorting D.
My question: can you come up with a plausible problem description which would naturally lead the unwary into a worse (or differently awful) implementation of a sorting algorithm?
I assume you're using this question for an interview to see if the applicant can notice a simple solution in a seemingly complex question.
[This assumption is incorrect -- MarkusQ]
You give too much information.
The key to solving this is realizing that the points are in one dimension and that a sort is all that is required. To make this question more difficult hide this fact as much as possible.
The biggest clue is the distance formula. It introduces a penalty for changing directions. The first thing an that comes to my mind is minimizing this penalty. To remove the penalty I have to order them in a certain direction, this ordering is the natural sort order.
I would remove the penalty for changing directions, it's too much of a give away.
Another major clue is the input values to the algorithm: a list of integers. Give them a list of permutations, or even all permutations. That sets them up to thinking that a O(n!) algorithm might actually be expected.
I would phrase it as:
Given a list of all possible
permutations of n delivery locations,
where each permutation of deliveries
(d1, d2, ...,
dn) has a cost defined by:
Return permutation P such that the
cost of P is less than or equal to any
other permutation.
All that really needs to be done is read in the first permutation and sort it.
If they construct a single loop to compare the costs ask them what the big-o runtime of their algorithm is where n is the number of delivery locations (Another trap).
This isn't a direct answer, but I think more clarification is needed.
Is di allowed to be negative? If so, sorting alone is not enough, as far as I can see.
For example:
d0 = 0
deliveries = (-1,1,1,2)
It seems the optimal path in this case would be 1 > 2 > 1 > -1.
Edit: This might not actually be the optimal path, but it illustrates the point.
YOu could rephrase it, having first found the optimal solution, as
"Give me a proof that the following convination is the most optimal for the following set of rules, where optimal means the smallest number results from the sum of all stage costs, taking into account that all stages (A..Z) need to be present once and once only.
Convination:
A->C->D->Y->P->...->N
Stage costs:
A->B = 5,
B->A = 3,
A->C = 2,
C->A = 4,
...
...
...
Y->Z = 7,
Z->Y = 24."
That ought to keep someone busy for a while.
This reminds me of the Knapsack problem, more than the Traveling Salesman. But the Knapsack is also an NP-Hard problem, so you might be able to fool people to think up an over complex solution using dynamic programming if they correlate your problem with the Knapsack. Where the basic problem is:
can a value of at least V be achieved
without exceeding the weight W?
Now the problem is a fairly good solution can be found when V is unique, your distances, as such:
The knapsack problem with each type of
item j having a distinct value per
unit of weight (vj = pj/wj) is
considered one of the easiest
NP-complete problems. Indeed empirical
complexity is of the order of O((log
n)2) and very large problems can be
solved very quickly, e.g. in 2003 the
average time required to solve
instances with n = 10,000 was below 14
milliseconds using commodity personal
computers1.
So you might want to state that several stops/packages might share the same vj, inviting people to think about the really hard solution to:
However in the
degenerate case of multiple items
sharing the same value vj it becomes
much more difficult with the extreme
case where vj = constant being the
subset sum problem with a complexity
of O(2N/2N).
So if you replace the weight per value to distance per value, and state that several distances might actually share the same values, degenerate, some folk might fall in this trap.
Isn't this just the (NP-Hard) Travelling Salesman Problem? It doesn't seem likely that you're going to make it much harder.
Maybe phrasing the problem so that the actual algorithm is unclear - e.g. by describing the paths as single-rail railway lines so the person would have to infer from domain knowledge that backtracking is more costly.
What about describing the question in such a way that someone is tempted to do recursive comparisions - e.g. "can you speed up the algorithm by using the optimum max subset of your best (so far) results"?
BTW, what's the purpose of this - it sounds like the intent is to torture interviewees.
You need to be clearer on whether the delivery truck has to return to base (making it a round trip), or not. If the truck does return, then a simple sort does not produce the shortest route, because the square of the return from the furthest point to base costs so much. Missing some hops on the way 'out' and using them on the way back turns out to be cheaper.
If you trick someone into a bad answer (for example, by not giving them all the information) then is it their foolishness or your deception that has caused it?
How great is the wisdom of the wise, if they heed not their ego's lies?

Resources