Advanced Math - Solving Optimal Sets Using a Program - algorithm

I want to develop a program which analyzes sets. I think the best way I can describe the program is using an example. For those of you familiar with toggle coverage that is the purpose of this application.
The goal is to reach 100% coverage.
TestA stresses X% of the chip, but % doesn't matter, what matters is which set of pins/portions of the chip is stressed. So let us say TestA stresses setA and TestB stresses setB, so on and so forth for Y number of tests until we reach 100% coverage.
Here is the problem, we want to reduce Y to Y' such that Y' is the minimum ammount of tests required. How? Lets say TestA can be eliminated because by running TestB, C, D we obtain the set which TestA would have covered.
The question I have is, I want to do research on this area (IEEE articles and so on) but don't know what to search? I am looking for titles, papers, etc. to help me determine an algorithm. If you have 1000 tests, I don't want to say "Can I eliminate testA with B? no? What about B+C? no? What about B+C+D?" In addition to being very slow, it doesn't account for the fact that sure A might be replaced by B+C+D, but A would have significantly helped with removing D+E+F.
I'd appreciate help in going in the right direction.
Thanks!

Sounds to me like a variation of the Set Cover Problem, which is NP-Complete.
Set Cover Problem:
Given a universe of elements U, and a set of sets: S = {X | X is a subset of U} - find minimal subset S' of S such that the union of all elements in S' is U, and S' is minimal.
Since the problem is NP-Complete, there is no known polynomial solution to it, and most believe one does not exist.
You can try approximation algorithms (formulate the problem as linear integer programming problem and use integer programming approximation solution), or some heuristics, such as greedy.

Related

Affect m students to n groups, but with constraints?

I asked about the minimum cost maximum flow several weeks ago. Kraskevich's answer was brilliant and solved my problem. I've implemented it and it works fine (available only in French, sorry). Additionaly, the algorithm can handle the assignement of i (i > 1) projects to each student.
Now I'm trying something more difficult. I'd like to add constraints on choices. In the case one wants to affect i (i > 1) projects to each student, I'd like to be able to specify which projects are compatible (each other).
In the case some projects are not compatible, I'd like the algorithm to return the global optimum, i.e. affect i projects to each student maximizing global happiness and repecting compatibility constraints.
Chaining i times the original method (and checking constraints at each step) will not help, as it would only return a local optimum.
Any idea about the correct graph to work with ?
Unfortunately, it is not solvable in polynomial time(unless P = NP or there are additional constraints).
Here is a polynomial time reduction from the maximum independent set problem(which is known to be NP-complete) to this one:
Given a graph G and a number k, do the following:
Create a project for each vertex in the graph G and say that two project are incompatible iff there is an edge between the corresponding vertices in G.
Create one student who likes each project equally(we can assume that the happiness each project gives to him is equal to 1).
Find the maximum happiness using an algorithm that solves the problem stated in your question. Let's call it h.
A set of projects can be picked iff they all are compatible, which means that the picked vertices of G form an independent set(due to the way we constructed the graph).
Thus, h is equal to the size of the maximum independent set.
Return h >= k.
What does it mean in practice? It means that it is not reasonable to look for a polynomial time solution to this problem. There are several things that can be done:
If the input is small, you can use exhaustive search.
If it is not, you can use heuristics and/or approximations to find a relatively good solution(not necessary the optimal one, though).
If you can stomach the library dependency, integer programming will be quicker and easier than anything you can implement yourself. All you have to do is formulate the original problem as an integer program and then add your ad hoc constraints at the end.

What are some examples of problems well suited for Integer Linear Programming?

I've always been writing software to solve business problems. I came across about LIP while I was going through one of the SO posts. I googled it but I am unable to relate how I can use it to solve business problems. Appreciate if some one can help me understand in layman terms.
ILP can be used to solve essentially any problem involving making a bunch of decisions, each of which only has several possible outcomes, all known ahead of time, and in which the overall "quality" of any combination of choices can be described using a function that doesn't depend on "interactions" between choices. To see how it works, it's easiest to restrict further to variables that can only be 0 or 1 (the smallest useful range of integers). Now:
Each decision requiring a yes/no answer becomes a variable
The objective function should describe the thing we want to maximise (or minimise) as a weighted combination of these variables
You need to find a way to express each constraint (combination of choices that cannot be made at the same time) using one or more linear equality or inequality constraints
Example
For example, suppose you have 3 workers, Anne, Bill and Carl, and 3 jobs, Dusting, Typing and Packing. All of the people can do all of the jobs, but they each have different efficiency/ability levels at each job, so we want to find the best task for each of them to do to maximise overall efficiency. We want each person to perform exactly 1 job.
Variables
One way to set this problem up is with 9 variables, one for each combination of worker and job. The variable x_ad will get the value 1 if Anne should Dust in the optimal solution, and 0 otherwise; x_bp will get the value 1 if Bill should Pack in the optimal solution, and 0 otherwise; and so on.
Objective Function
The next thing to do is to formulate an objective function that we want to maximise or minimise. Suppose that based on Anne, Bill and Carl's most recent performance evaluations, we have a table of 9 numbers telling us how many minutes it takes each of them to perform each of the 3 jobs. In this case it makes sense to take the sum of all 9 variables, each multiplied by the time needed for that particular worker to perform that particular job, and to look to minimise this sum -- that is, to minimise the total time taken to get all the work done.
Constraints
The final step is to give constraints that enforce that (a) everyone does exactly 1 job and (b) every job is done by exactly 1 person. (Note that actually these steps can be done in any order.)
To make sure that Anne does exactly 1 job, we can add the constraint that x_ad + x_at + x_ap = 1. Similar constraints can be added for Bill and Carl.
To make sure that exactly 1 person Dusts, we can add the constraint that x_ad + x_bd + x_cd = 1. Similar constraints can be added for Typing and Packing.
Altogether there are 6 constraints. You can now supply this 9-variable, 6-constraint problem to an ILP solver and it will spit back out the values for the variables in one of the optimal solutions -- exactly 3 of them will be 1 and the rest will be 0. The 3 that are 1 tell you which people should be doing which job!
ILP is General
As it happens, this particular problem has a special structure that allows it to be solved more efficiently using a different algorithm. The advantage of using ILP is that variations on the problem can be easily incorporated: for example if there were actually 4 people and only 3 jobs, then we would need to relax the constraints so that each person does at most 1 job, instead of exactly 1 job. This can be expressed simply by changing the equals sign in each of the 1st 3 constraints into a less-than-or-equals sign.
First, read a linear programming example from Wikipedia
Now imagine the farmer producing pigs and chickens, or a factory producing toasters and vacuums - now the outputs (and possibly constraints) are integers, so those pretty graphs are going to go all crookedly step-wise. That's a business application that is easily represented as a linear programming problem.
I've used integer linear programming before to determine how to tile n identically proportioned images to maximize screen space used to display these images, and the formalism can represent covering problems like scheduling, but business applications of integer linear programming seem like the more natural applications of it.
SO user flolo says:
Use cases where I often met it: In digital circuit design you have objects to be placed/mapped onto certain parts of a chip (FPGA-Placing) - this can be done with ILP. Also in HW-SW codesign there often arise the partition problem: Which part of a program should still run on a CPU and which part should be accelerated on hardware. This can be also solved via ILP.
A sample ILP problem will looks something like:
maximize 37∙x1 + 45∙x2
where
x1,x2,... should be positive integers
...but, there is a set of constrains in the form
a1∙x1+b1∙x2 < k1
a2∙x1+b2∙x2 < k2
a3∙x1+b3∙x2 < k3
...
Now, a simpler articulation of Wikipedia's example:
A farmer has L m² land to be planted with either wheat or barley or a combination of the two.
The farmer has F grams of fertilizer, and P grams of insecticide.
Every m² of wheat requires F1 grams of fertilizer, and P1 grams of insecticide
Every m² of barley requires F2 grams of fertilizer, and P2 grams of insecticide
Now,
Let a1 denote the selling price of wheat per 1 m²
Let a2 denote the selling price of barley per 1 m²
Let x1 denote the area of land to be planted with wheat
Let x2 denote the area of land to be planted with barley
x1,x2 are positive integers (Assume we can plant in 1 m² resolution)
So,
the profit is a1∙x1 + a2∙x2 - we want to maximize it
Because the farmer has a limited area of land: x1+x2<=L
Because the farmer has a limited amount of fertilizer: F1∙x1+F2∙x2 < F
Because the farmer has a limited amount of insecticide: P1∙x1+P2∙x2 < P
a1,a2,L,F1,F2,F,P1,P2,P - are all constants (in our example: positive)
We are looking for positive integers x1,x2 that will maximize the expression stated, given the constrains stated.
Hope it's clear...
ILP "by itself" can directly model lots of stuff. If you search for LP examples you will probably find lots of famous textbook cases, such as the diet problem
Given a set of pills, each with a vitamin content and a daily vitamin
quota, find the cheapest cocktail that matches the quota.
Many such problems naturally have instances that require varialbe to be integers (perhaps you can't split pills in half)
The really interesting stuff though is that actually a big deal of combinatorial problems reduce to LP. One of my favourites is the assignment problem
Given a set of N workers, N tasks and an N by N matirx describing how
much each worker charges for the each task, determine what task to
give to each worker in order to minimize cost.
Most solution that naturally come up have exponential complexity but there is a polynomial solution using linear programming.
When it comes to ILP, ILP has the added benefit/difficulty of being NP-complete. This means that it can be used to model a very wide range of problems (boolean satisfiability is also very popular in this regard). Since there are many good and optimized ILP solvers out there it is often viable to translate an NP-complete problem into ILP instead of devising a custom algorithm of your own.
You can apply linear program easily everywhere you want to optimize and the target function is linear. You can make schedules (I mean big, like train companies, who need to optimize the utilization of the vehicles and tracks), productions (optimize win), almost everything. Sometimes it is tricky to formulate your problem as IP and/or sometimes you meet the problem that your solution is, that you have to produce e.g. 0.345 cars for optimum win. That is of course not possible, and so you constraint even more: Your variable for the number of cars must be integer. Even when it now sounds simpler (because you have infinite less choices for your variable), its actually harder. In this moment it gets NP-hard. Which actually means you can solve ANY problem from your computer with ILP, you just have to transform it.
For you I would recommend an intro into reading some basic (I)LP stuff. From my mind I dont know any good online site (but if you goolge you will find some), as book I can recommend Linear Programming from Chvatal. It has very good examples, and describes also real use cases.
The other answers here have excellent examples. Two of the gold standards in business of using integer programming and more generally operations research are
the journal Interfaces published by INFORMS (The Institute for Operations Research and the Management Sciences)
winners of the the Franz Edelman Award for Achievement in Operations Research and the Management Sciences
Interfaces publishes research that uses operations research applied to real-world problems, and the Edelman award is a highly competitive award for business use of operations research techniques.

backtracking algorithm for set cover

Can someone provide me with a backtracking algorithm to solve the "set cover" problem to find the minimum number of sets that cover all the elements in the universe?
The greedy approach almost always selects more sets than the optimal number of sets.
This paper uses Linear Programming Relaxation to solve covering problems.
Basically, the LP relaxation yields good bounds, and can be used to identify solutions that are optimum in many cases. Incidentally, when I last looked at open source LP solvers (~2003) I wasn't impressed (some gave incorrect results), but there seem to be some decent open source LP solvers now.
Your problem needs a little more clarification - it seems that you are given a family of subsets $$S_1,\ldots,S_n$$ of a set A, such that the union of the subsets equals A, and you want a minimum number of subsets whose union is still A.
The basic approach is branch and bound with some heuristics. E.g., if a particular element of A is in only one subset $$S_i$$, then you must select $$S_i$$. Similarly, if $$S_k$$ is a subset of $$S_j$$, then there's no reason to consider $$S_k$$; if element $$a_i$$ is in every subset that $$a_j$$ is in, then you can not bother considering $$a_i$$.
For branch and bound you need good bounding heuristics. Lower bounds can come from independent sets (if there are k elements $$i_1,\ldots,i_L$$ in A such that each if $$i_p$$ is contained in $$A_p$$ and $$i_q$$ is contained in $$A_q$$ then $$A_p$$ and $$A_q$$ are disjoint). Better lower bounds come from the LP relaxation described above.
The Espresso logic minimization system from Berkeley has a very high quality set covering engine.

Efficient evaluation of hypergeometric functions

Does anyone have experience with algorithms for evaluating hypergeometric functions? I would be interested in general references, but I'll describe my particular problem in case someone has dealt with it.
My specific problem is evaluating a function of the form 3F2(a, b, 1; c, d; 1) where a, b, c, and d are all positive reals and c+d > a+b+1. There are many special cases that have a closed-form formula, but as far as I know there are no such formulas in general. The power series centered at zero converges at 1, but very slowly; the ratio of consecutive coefficients goes to 1 in the limit. Maybe something like Aitken acceleration would help?
I tested Aitken acceleration and it does not seem to help for this problem (nor does Richardson extrapolation). This probably means Pade approximation doesn't work either. I might have done something wrong though, so by all means try it for yourself.
I can think of two approaches.
One is to evaluate the series at some point such as z = 0.5 where convergence is rapid to get an initial value and then step forward to z = 1 by plugging the hypergeometric differential equation into an ODE solver. I don't know how well this works in practice; it might not, due to z = 1 being a singularity (if I recall correctly).
The second is to use the definition of 3F2 in terms of the Meijer G-function. The contour integral defining the Meijer G-function can be evaluated numerically by applying Gaussian or doubly-exponential quadrature to segments of the contour. This is not terribly efficient, but it should work, and it should scale to relatively high precision.
Is it correct that you want to sum a series where you know the ratio of successive terms and it is a rational function?
I think Gosper's algorithm and the rest of the tools for proving hypergeometric identities (and finding them) do exactly this, right? (See Wilf and Zielberger's A=B book online.)

What's the most insidious way to pose this problem?

My best shot so far:
A delivery vehicle needs to make a series of deliveries (d1,d2,...dn), and can do so in any order--in other words, all the possible permutations of the set D = {d1,d2,...dn} are valid solutions--but the particular solution needs to be determined before it leaves the base station at one end of the route (imagine that the packages need to be loaded in the vehicle LIFO, for example).
Further, the cost of the various permutations is not the same. It can be computed as the sum of the squares of distance traveled between di -1 and di, where d0 is taken to be the base station, with the caveat that any segment that involves a change of direction costs 3 times as much (imagine this is going on on a railroad or a pneumatic tube, and backing up disrupts other traffic).
Given the set of deliveries D represented as their distance from the base station (so abs(di-dj) is the distance between two deliveries) and an iterator permutations(D) which will produce each permutation in succession, find a permutation which has a cost less than or equal to that of any other permutation.
Now, a direct implementation from this description might lead to code like this:
function Cost(D) ...
function Best_order(D)
for D1 in permutations(D)
Found = true
for D2 in permutations(D)
Found = false if cost(D2) > cost(D1)
return D1 if Found
Which is O(n*n!^2), e.g. pretty awful--especially compared to the O(n log(n)) someone with insight would find, by simply sorting D.
My question: can you come up with a plausible problem description which would naturally lead the unwary into a worse (or differently awful) implementation of a sorting algorithm?
I assume you're using this question for an interview to see if the applicant can notice a simple solution in a seemingly complex question.
[This assumption is incorrect -- MarkusQ]
You give too much information.
The key to solving this is realizing that the points are in one dimension and that a sort is all that is required. To make this question more difficult hide this fact as much as possible.
The biggest clue is the distance formula. It introduces a penalty for changing directions. The first thing an that comes to my mind is minimizing this penalty. To remove the penalty I have to order them in a certain direction, this ordering is the natural sort order.
I would remove the penalty for changing directions, it's too much of a give away.
Another major clue is the input values to the algorithm: a list of integers. Give them a list of permutations, or even all permutations. That sets them up to thinking that a O(n!) algorithm might actually be expected.
I would phrase it as:
Given a list of all possible
permutations of n delivery locations,
where each permutation of deliveries
(d1, d2, ...,
dn) has a cost defined by:
Return permutation P such that the
cost of P is less than or equal to any
other permutation.
All that really needs to be done is read in the first permutation and sort it.
If they construct a single loop to compare the costs ask them what the big-o runtime of their algorithm is where n is the number of delivery locations (Another trap).
This isn't a direct answer, but I think more clarification is needed.
Is di allowed to be negative? If so, sorting alone is not enough, as far as I can see.
For example:
d0 = 0
deliveries = (-1,1,1,2)
It seems the optimal path in this case would be 1 > 2 > 1 > -1.
Edit: This might not actually be the optimal path, but it illustrates the point.
YOu could rephrase it, having first found the optimal solution, as
"Give me a proof that the following convination is the most optimal for the following set of rules, where optimal means the smallest number results from the sum of all stage costs, taking into account that all stages (A..Z) need to be present once and once only.
Convination:
A->C->D->Y->P->...->N
Stage costs:
A->B = 5,
B->A = 3,
A->C = 2,
C->A = 4,
...
...
...
Y->Z = 7,
Z->Y = 24."
That ought to keep someone busy for a while.
This reminds me of the Knapsack problem, more than the Traveling Salesman. But the Knapsack is also an NP-Hard problem, so you might be able to fool people to think up an over complex solution using dynamic programming if they correlate your problem with the Knapsack. Where the basic problem is:
can a value of at least V be achieved
without exceeding the weight W?
Now the problem is a fairly good solution can be found when V is unique, your distances, as such:
The knapsack problem with each type of
item j having a distinct value per
unit of weight (vj = pj/wj) is
considered one of the easiest
NP-complete problems. Indeed empirical
complexity is of the order of O((log
n)2) and very large problems can be
solved very quickly, e.g. in 2003 the
average time required to solve
instances with n = 10,000 was below 14
milliseconds using commodity personal
computers1.
So you might want to state that several stops/packages might share the same vj, inviting people to think about the really hard solution to:
However in the
degenerate case of multiple items
sharing the same value vj it becomes
much more difficult with the extreme
case where vj = constant being the
subset sum problem with a complexity
of O(2N/2N).
So if you replace the weight per value to distance per value, and state that several distances might actually share the same values, degenerate, some folk might fall in this trap.
Isn't this just the (NP-Hard) Travelling Salesman Problem? It doesn't seem likely that you're going to make it much harder.
Maybe phrasing the problem so that the actual algorithm is unclear - e.g. by describing the paths as single-rail railway lines so the person would have to infer from domain knowledge that backtracking is more costly.
What about describing the question in such a way that someone is tempted to do recursive comparisions - e.g. "can you speed up the algorithm by using the optimum max subset of your best (so far) results"?
BTW, what's the purpose of this - it sounds like the intent is to torture interviewees.
You need to be clearer on whether the delivery truck has to return to base (making it a round trip), or not. If the truck does return, then a simple sort does not produce the shortest route, because the square of the return from the furthest point to base costs so much. Missing some hops on the way 'out' and using them on the way back turns out to be cheaper.
If you trick someone into a bad answer (for example, by not giving them all the information) then is it their foolishness or your deception that has caused it?
How great is the wisdom of the wise, if they heed not their ego's lies?

Resources