Post your best solutions! You can find the full problem description and examples here: ACM 2010 problems (pdf)
You have a set of castles connected by roads, and you want to conquer all the castles with the minimum number of soldiers. Each castle has three properties: the minimum number of soldiers required to take it, the number of soldiers that will die taking it, and the number of soldiers that must be left behind to hold it.
There is exactly one path between any two castles (the roads form a tree). You can pick any castle as the first target, but you must follow the roads afterward. You can only travel a road twice. Your mobile army must stay in one group.
I would solve this this way:
Bruteforce all starting castles (100 max)
For each starting castle:
fill up array
need[i] and cost[i] means that when you go from chosen starting point to i, and trying to conqure subtree starting at i, you would need at least need[i] solders and cost[i] solders would die.
min_solder_to_attack_castle[i] goes from input file.
Obviously, need[] and cost[] values are obvious for "terminal" castles.
Then, for each castle which have known need[] and cost[] values for all "childs" you calculate need and cost for this castle this way:
cost[i] = sum(cost[childs])
Getting need[i] is the tricky part: we know it's somewhere between max(min_solder_to_attack_castle[all childs]), and max(min_solder_to_attack_castle[all childs])+max(cost[all childs]). Trying all variants would cost us (number_of_childs)! and potentially be n!, and probably optimizations would help here, here is where I stopped for now.
I would solve this in reverse - you want to have as few men "wasted" after taking the last castle as possible. Since we can't pass through a castle without taking it, we will obviously end at a "leaf" castle.
It is straightforward to walk backwards from all leaf castles to determine the total number of men "wasted" on each subtree - then it's simply a matter of walking the subtrees in the right order.
Elementary, my dear Watson.
The first thing to realize is that, as far as the numbers go, there is no difference between soldiers lost and soldiers left behind. So we can reduce the castle properties to soldiers lost and required.
The second thing to realize is that if you go down a branch of the tree, you must complete the whole branch for returning. This allows us to reduce the entire branch to a single "mega castle" with aggregate soldiers required and lost.
So, assuming we can compute the costs of branches, we're left with two problems: where to start, and how to choose which branch to descend first. I'm just going to brute force the start position, but it might be possible to do better. Choosing which branch to descend is a bit harder. The number of soldiers of lost is trivial, but the number required is not. There are n! possibilities, so we can't just try them all.
Instead of thinking about how many soldiers are lost/required at each castle, I'm going to go backwards. Start with 0 soldiers, and add them when you attack a castle, ensuring we end up with at least the required amount. There are two cases: either there is a castle which we meet the requirement for, or there is not. If there is, (un)do that castle (this is optimal, because we used the minimum number of soldiers). If there isn't, add an additional soldier and try again (this is optimal, because we must add a soldier to continue). Now it should become obvious: we want to (un)do castle with requirements closest to the number lost first. Just sort by (required minus lost) and that's your order.
So the final algorithm looks like this:
Brute force the starting point
Recursively reduce branches into aggregate castles (memoize this result, for the other starting points)
Visit branches in descending (required minus lost) order.
The running time is O(n * c^2 * lg(c)), where n is the number of castles and c is the maximum connectivity of any single castle. This worse because there are at most nc 'branches', and a node takes at most clg(c) time to evaluate after its branches have been evaluated. [The branches and nodes are computed at most once thanks to memoization]
I think it's possible to do better, but I'm not sure how.
Related
I am prepping for a final and this was a practice problem. It is not a homework problem.
How do I go about attacking this? Also, more generally, how do I know when to use Greedy vs. Dynamic programming? Intuitively, I think this is a good place to use greedy. I'm also thinking that if I could somehow create an orthogonal line and "sweep" it, checking the #of intersections at each point and updating a global max, then I could just return the max at the end of the sweep. I'm not sure how to plane sweep algorithmically though.
a. We are given a set of activities I1 ... In: each activity Ii is represented by its left-point Li and its right-point Ri. Design a very efficient algorithm that finds the maximum number of mutually overlapping subset of activities (write your solution in English, bullet by bullet).
b. Analyze the time complexity of your algorithm.
Proposed solution:
Ex set: {(0,2) (3,7) (4,6) (7,8) (1,5)}
Max is 3 from interval 4-5
1) Split start and end points into two separate arrays and sort them in non-decreasing order
Start points: [0,1,3,4,7] (SP)
End points: [2,5,6,7,8] (EP)
I know that I can use two pointers to sort of simulate the plane sweep, but I'm not exactly sure how. I'm stuck here.
I'd say your idea of a sweep is good.
You don't need to worry about planar sweeping, just use the start/end points. Put the elements in a queue. In every step take the smaller element from the queue front. If it's a start point, increment current tasks count, otherwise decrement it.
Since you don't need to point which tasks are overlapping - just the count of them - you don't need to worry about specific tasks duration.
Regarding your greedy vs DP question, in my non-professional opinion greedy may not always provide valid answer, whereas DP only works for problem that can be divided into smaller subproblems well. In this case, I wouldn't call your sweep-solution either.
I'm trying to come up with a fast and reasonably optimal algorithm to solve the following TSP/hamiltonian-path-like problem:
A delivery vehicle has a number of pickups and dropoffs it needs to
perform:
For each delivery, the pickup needs to come before the
dropoff.
The vehicle is quite small and the packages vary in size.
The total carriage cannot exceed some upper bound (e.g. 1 cubic
metre). Each delivery has a deadline.
The planner can run mid-route, so the vehicle will begin with a number of jobs already picked up and some capacity already taken up.
A near-optimal solution should minimise the total cost (for simplicity, distance) between each waypoint. If a solution does not exist because of the time constraints, I need to find a solution that has the fewest number of late deliveries. Some illustrations of an example problem and a non-optimal, but valid solution:
I am currently using a greedy best first search with backtracking bounded to 100 branches. If it fails to find a solution with on-time deliveries, I randomly generate as many as I can in one second (the most computational time I can spare) and pick the one with the fewest number of late deliveries. I have looked into linear programming but can't get my head around it - plus I would think it would be inappropriate given it needs to be run very frequently. I've also tried algorithms that require mutating the tour, but the issue is mutating a tour nearly always makes it invalid due to capacity constraints and precedence. Can anyone think of a better heuristic approach to solving this problem? Many thanks!
Safe Moves
Here are some ideas for safely mutating an existing feasible solution:
Any two consecutive stops can always be swapped if they are both pickups, or both deliveries. This is obviously true for the "both deliveries" case; for the "both pickups" case: if you had room to pick up A, then pick up B without delivering anything in between, then you have room to pick up B first, then pick up A. (In fact a more general rule is possible: In any pure-delivery or pure-pickup sequence of consecutive stops, the stops can be rearranged arbitrarily. But enumerating all the possibilities might become prohibitive for long sequences, and you should be able to get most of the benefit by considering just pairs.)
A pickup of A can be swapped with any later delivery of something else B, provided that A's original pickup comes after B was picked up, and A's own delivery comes after B's original delivery. In the special case where the pickup of A is immediately followed by the delivery of B, they can always be swapped.
If there is a delivery of an item of size d followed by a pickup of an item of size p, then they can be swapped provided that there is enough extra room: specifically, provided that f >= p, where f is the free space available before the delivery. (We already know that f + d >= p, otherwise the original schedule wouldn't be feasible -- this is a hint to look for small deliveries to apply this rule to.)
If you are starting from purely randomly generated schedules, then simply trying all possible moves, greedily choosing the best, applying it and then repeating until no more moves yield an improvement should give you a big quality boost!
Scoring Solutions
It's very useful to have a way to score a solution, so that they can be ordered. The nice thing about a score is that it's easy to incorporate levels of importance: just as the first digit of a two-digit number is more important than the second digit, you can design the score so that more important things (e.g. deadline violations) receive a much greater weight than less important things (e.g. total travel time or distance). I would suggest something like 1000 * num_deadline_violations + total_travel_time. (This assumes of course that total_travel_time is in units that will stay beneath 1000.) We would then try to minimise this.
Managing Solutions
Instead of taking one solution and trying all the above possible moves on it, I would instead suggest using a pool of k solutions (say, k = 10000) stored in a min-heap. This allows you to extract the best solution in the pool in O(log k) time, and to insert new solutions in the same time.
You could initially populate the pool with randomly generated feasible solutions; then on each step, you would extract the best solution in the pool, try all possible moves on it to generate child solutions, and insert any child solutions that are better than their parent back into the pool. Whenever the pool doubles in size, pull out the first (i.e. best) k solutions and make a new min-heap with them, discarding the old one. (Performing this step after the heap grows to a constant multiple of its original size like this has the nice property of leaving the amortised time complexity unchanged.)
It can happen that some move on solution X produces a child solution Y that is already in the pool. This wastes memory, which is unfortunate, but one nice property of the min-heap approach is that you can at least handle these duplicates cheaply when they arrive at the front of the heap: all duplicates will have identical scores, so they will all appear consecutively when extracting solutions from the top of the heap. Thus to avoid having duplicate solutions generate duplicate children "down through the generations", it suffices to check that the new top of the heap is different from the just-extracted solution, and keep extracting and discarding solutions until this holds.
A note on keeping worse solutions: It might seem that it could be worthwhile keeping child solutions even if they are slightly worse than their parent, and indeed this may be useful (or even necessary to find the absolute optimal solution), but doing so has a nasty consequence: it means that it's possible to cycle from one solution to its child and back again (or possibly a longer cycle). This wastes CPU time on solutions we have already visited.
You are basically combining the Knapsack Problem with the Travelling Salesman Problem.
Your main problem here seems to be actually the Knapsack Problem, rather then the Travelling Salesman Problem, since it has the one hard restriction (maximum delivery volume). Maybe try to combine the solutions for the Knapsack Problem with the Travelling Salesman.
If you really only have one second max for calculations a greedy algorithm with backtracking might actually be one of the best solutions that you can get.
There's a map with points:
The green number next to each point is that point's ID and the red number is the bonus for that point. I have to find fastest cycle that starts and ends at the point #1 and that gains at least x (15 in this case) bonus points. I can use cities several times; however, I will gain bonus points only once.
I have to do this with the backtracking algorithm, but I don't really know where to start. I've stutied about it, but I can't see the connection between this and a backtracking.
The output would look like this:
(1,3,5,2,1) (11.813 length)
Backtracking is a technique applied to reduce the search space of a problem. So, you have a problem, you have a space with optimal and non-optimal solutions, and you have to pick up one optimal solution.
A simple strategy, in your problem, is to generate all the possible solutions. However, this solution would traverse the entire space of solutions, and, some times, being aware that no optimal solution will be found.
That's the main role of backtracking: you traverse the space of solutions and, when you reach a given point where you know no optimal answer will be achieved if the search continue on the same path, you can simply repent of the step taken, go back in the traversal, and select the step that comes right after the one you found to be helpless.
In your problem, since the nodes can be visited more than once, the idea is to maintain, for each vertex, a list of vertices sorted decreasingly by the distance from the vertex owner of the list.
Then, you can simply start in one of the vertices, and do the walk on the graph, vertex by vertex, always checking if the objective is still achievable, and backtracking in the solution whenever it's noticed that no solution will be possible from a certain point.
You can use a recursive backtracking algorithm to list all possible cycles and keep the best answer:
visitCycles(list<Int> cycleSoFar)
{
if cycle formed by closing (cycleSoFar) > best answer so far
{
best answer so far = cycle formed by closing (cycleSoFar)
}
if (cannot improve (cycleSoFar))
{
return
}
for each point that makes sense
{
add point to cycleSoFar
visitCycles(cycleSoFar)
remove point from cycleSoFar
}
}
To add a bit more detail:
1) A cycle is no good unless it has at least 15 bonus points. If it is any good, it is better than the best answer so far if it is shorter.
2) As you add more points to a cycle you only make it longer, not shorter. So if you have found a possible answer and cycleSoFar is already at least as long as that possible answer, then you cannot improve it and you might as well return.
3) Since you don't get any bonus points by reusing points already in the cycle, it doesn't make sense to try adding a point twice.
4) You may be able to speed up the program by iterating over "each point that makes sense" in a sensible order, for instance by choosing the closest point to the current point first. You might save time by pre-computing, for each point, a list of all the other points in ascending order of distance (or you might not - you might have to try different schemes by experiment).
Given a bunch of sets of people (similar to):
[p1,p2,p3]
[p2,p3]
[p1]
[p1]
Select 1 from each set, trying to minimize the maximum number of times any one person is selected.
For the sets above, the max number of times a given person MUST be selected is 2.
I'm struggling to get an algorithm for this. I don't think it can be done with a greedy algorithm, more thinking along the lines of a dynamic programming solution.
Any hints on how to go about this? Or do any of you know any good websites about this stuff that I could have a look at?
This is neither dynamic nor greedy. Let's look at a different problem first -- can it be done by selecting every person at most once?
You have P people and S sets. Create a graph with S+P vertices, representing sets and people. There is an edge between person pi and set si iff pi is an element of si. This is a bipartite graph and the decision version of your problem is then equivalent to testing whether the maximum cardinality matching in that graph has size S.
As detailed on that page, this problem can be solved by using a maximum flow algorithm (note: if you don't know what I'm talking about, then take your time to read it now, as you won't understand the rest otherwise): first create a super-source, add an edge linking it to all people with capacity 1 (representing that each person may only be used once), then create a super-sink and add edges linking every set to that sink with capacity 1 (representing that each set may only be used once) and run a suitable max-flow algorithm between source and sink.
Now, let's consider a slightly different problem: can it be done by selecting every person at most k times?
If you paid attention to the remarks in the last paragraph, you should know the answer: just change the capacity of the edges leaving the super-source to indicate that each person may be used more than once in this case.
Therefore, you now have an algorithm to solve the decision problem in which people are selected at most k times. It's easy to see that if you can do it with k, then you can also do it with any value greater than k, that is, it's a monotonic function. Therefore, you can run a binary search on the decision version of the problem, looking for the smallest k possible that still works.
Note: You could also get rid of the binary search by testing each value of k sequentially, and augmenting the residual network obtained in the last run instead of starting from scratch. However, I decided to explain the binary search version as it's conceptually simpler.
Problem description
There are different categories which contain an arbitrary amount of elements.
There are three different attributes A, B and C. Each element does have an other distribution of these attributes. This distribution is expressed through a positive integer value. For example, element 1 has the attributes A: 42 B: 1337 C: 18. The sum of these attributes is not consistent over the elements. Some elements have more than others.
Now the problem:
We want to choose exactly one element from each category so that
We hit a certain threshold on attributes A and B (going over it is also possible, but not necessary)
while getting a maximum amount of C.
Example: we want to hit at least 80 A and 150 B in sum over all chosen elements and want as many C as possible.
I've thought about this problem and cannot imagine an efficient solution. The sample sizes are about 15 categories from which each contains up to ~30 elements, so bruteforcing doesn't seem to be very effective since there are potentially 30^15 possibilities.
My model is that I think of it as a tree with depth number of categories. Each depth level represents a category and gives us the choice of choosing an element out of this category. When passing over a node, we add the attributes of the represented element to our sum which we want to optimize.
If we hit the same attribute combination multiple times on the same level, we merge them so that we can stripe away the multiple computation of already computed values. If we reach a level where one path has less value in all three attributes, we don't follow it anymore from there.
However, in the worst case this tree still has ~30^15 nodes in it.
Does anybody of you can think of an algorithm which may aid me to solve this problem? Or could you explain why you think that there doesn't exist an algorithm for this?
This question is very similar to a variation of the knapsack problem. I would start by looking at solutions for this problem and see how well you can apply it to your stated problem.
My first inclination to is try branch-and-bound. You can do it breadth-first or depth-first, and I prefer depth-first because I think it's cleaner.
To express it simply, you have a tree-walk procedure walk that can enumerate all possibilities (maybe it just has a 5-level nested loop). It is augmented with two things:
At every step of the way, it keeps track of the cost at that point, where the cost can only increase. (If the cost can also decrease, it becomes more like a minimax game tree search.)
The procedure has an argument budget, and it does not search any branches where the cost can exceed the budget.
Then you have an outer loop:
for (budget = 0; budget < ... ; budget++){
walk(budget);
// if walk finds a solution within the budget, halt
}
The amount of time it takes is exponential in the budget, so easier cases will take less time. The fact that you are re-doing the search doesn't matter much because each level of the budget takes as much or more time than all the previous levels combined.
Combine this with some sort of heuristic about the order in which you consider branches, and it may give you a workable solution for typical problems you give it.
IF that doesn't work, you can fall back on basic heuristic programming. That is, do some cases by hand, and pay attention to how you did it. Then program it the same way.
I hope that helps.