Implementing a popularity based priority graph - algorithm

I want to implement a graph where nodes are items and outward edges are the popularity of the next item. (e.g. I've done this task, what are the most popular tasks performed after it?) My initial algorithm simply incremented a popularity relationship each time it was traversed, but this yields three problems:
First, there are potentially many (100,000+) items of up to 10-15 Unicode characters, and I'd like to keep the total space as small as possible. Second, each (relationship) number runs the risk of overflowing, and dividing the popularity by two each time a value approaches the edge is time consuming and loses a lot of accuracy as far as the differences in the popularity of other items.
(i.e. Assume a-4->b and a-5->c and a-255->d. With one byte, a-255->d will overflow at the next increment, but dividing the relationships by 2 will give: a-2->b and a-2->c and a-127->d)
Third, it makes it difficult for less popular edges to gain popularity.
I've also thought about a queue-like structure where each transition en-queues the next item and de-queues the oldest one when the queue is full. However, this has the problem of being too dynamic if the queue is too small and of eating up a HUGE amount of space, even if the queue is only ten elements.
Any ideas on algorithms/data structures for approaching this problem?

Related

Sorting in Computer Science vs. sorting in the 'real' world

I was thinking about sorting algorithms in software, and possible ways one could surmount the O(nlogn) roadblock. I don't think it IS possible to sort faster in a practical sense, so please don't think that I do.
With that said, it seems with almost all sorting algorithms, the software must know the position of each element. Which makes sense, otherwise, how would it know where to place each element according to some sorting criteria?
But when I crossed this thinking with the real world, a centrifuge has no idea what position each molecule is in when it 'sorts' the molecules by density. In fact, it doesn't care about the position of each molecule. However it can sort trillions upon trillions of items in a relatively short period of time, due to the fact that each molecule follows density and gravitational laws - which got me thinking.
Would it be possible with some overhead on each node (some value or method tacked on to each of the nodes) to 'force' the order of the list? Something like a centrifuge, where only each element cares about its relative position in space (in relation to other nodes). Or, does this violate some rule in computation?
I think one of the big points brought up here is the quantum mechanical effects of nature and how they apply in parallel to all particles simultaneously.
Perhaps classical computers inherently restrict sorting to the domain of O(nlogn), where as quantum computers may be able to cross that threshold into O(logn) algorithms that act in parallel.
The point that a centrifuge being basically a parallel bubble sort seems to be correct, which has a time complexity of O(n).
I guess the next thought is that if nature can sort in O(n), why can't computers?
EDIT: I had misunderstood the mechanism of a centrifuge and it appears that it does a comparison, a massively-parallel one at that. However there are physical processes that operate on a property of the entity being sorted rather than comparing two properties. This answer covers algorithms that are of that nature.
A centrifuge applies a sorting mechanism that doesn't really work by means of comparisons between elements, but actually by a property ('centrifugal force') on each individual element in isolation.Some sorting algorithms fall into this theme, especially Radix Sort. When this sorting algorithm is parallelized it should approach the example of a centrifuge.
Some other non-comparative sorting algorithms are Bucket sort and Counting Sort. You may find that Bucket sort also fits into the general idea of a centrifuge (the radius could correspond to a bin).
Another so-called 'sorting algorithm' where each element is considered in isolation is the Sleep Sort. Here time rather than the centrifugal force acts as the magnitude used for sorting.
Computational complexity is always defined with respect to some computational model. For example, an algorithm that's O(n) on a typical computer might be O(2n) if implemented in Brainfuck.
The centrifuge computational model has some interesting properties; for example:
it supports arbitrary parallelism; no matter how many particles are in the solution, they can all be sorted simultaneously.
it doesn't give a strict linear sort of particles by mass, but rather a very close (low-energy) approximation.
it's not feasible to examine the individual particles in the result.
it's not possible to sort particles by different properties; only mass is supported.
Given that we don't have the ability to implement something like this in general-purpose computing hardware, the model may not have practical relevance; but it can still be worth examining, to see if there's anything to be learned from it. Nondeterministic algorithms and quantum algorithms have both been active areas of research, for example, even though neither is actually implementable today.
The trick is there, that you only have a probability of sorting your list using a centrifuge. As with other real-world sorts [citation needed], you can change the probability that your have sorted your list, but never be certain without checking all the values (atoms).
Consider the question: "How long should you run your centrifuge for?"
If you only ran it for a picosecond, your sample may be less sorted than the initial state.. or if you ran it for a few days, it may be completely sorted. However, you wouldn't know without actually checking the contents.
A real world example of a computer based "ordering" would be autonomous drones that cooperatively work with each other, known as "drone swarms". The drones act and communicate both as individuals and as a group, and can track multiple targets. The drones collectively decide which drones will follow which targets and the obvious need to avoid collisions between drones. The early versions of this were drones that moved through way points while staying in formation, but the formation could change.
For a "sort", the drones could be programmed to form a line or pattern in a specific order, initially released in any permutation or shape, and collectively and in parallel they would quickly form the ordered line or pattern.
Getting back to a computer based sort, one issue is that there's one main memory bus, and there's no way for a large number of objects to move about in memory in parallel.
know the position of each element
In the case of a tape sort, the position of each element (record) is only "known" to the "tape", not to the computer. A tape based sort only needs to work with two elements at a time, and a way to denote run boundaries on a tape (file mark, or a record of different size).
IMHO, people overthink log(n). O(nlog(n)) IS practically O(n). And you need O(n) just to read the data.
Many algorithms such as quicksort do provide a very fast way to sort elements. You could implement variations of quicksort that would be very fast in practice.
Inherently all physical systems are infinitely parallel. You might have a buttload of atoms in a grain of sand, nature has enough computational power to figure out where each electron in each atom should be. So if you had enough computational resources (O(n) processors) you could sort n numbers in log(n) time.
From comments:
Given a physical processor that has k number of elements, it can achieve a parallelness of at most O(k). If you process n numbers arbitrarily, it would still process it at a rate related to k. Also, you could formulate this problem physically. You could create n steel balls with weights proportional to the number you want to encode, which could be solved by a centrifuge in a theory. But here the amount of atoms you are using is proportional to n. Whereas in a standard case you have a limited number of atoms in a processor.
Another way to think about this is, say you have a small processor attached to each number and each processor can communicate with its neighbors, you could sort all those numbers in O(log(n)) time.
I worked in an office summers after high school when I started college. I had studied in AP Computer Science, among other things, sorting and searching.
I applied this knowledge in several physical systems that I can recall:
Natural merge sort to start…
A system printed multipart forms including a file-card-sized tear off, which needed to be filed in a bank of drawers.
I started with a pile of them and sorted the pile to begin with. The first step is picking up 5 or so, few enough to be easily placed in order in your hand. Place the sorted packet down, criss-crossing each stack to keep them separate.
Then, merge each pair of stacks, producing a larger stack. Repeat until there is only one stack.
…Insertion sort to complete
It is easier to file the sorted cards, as each next one is a little farther down the same open drawer.
Radix sort
This one nobody else understood how I did it so fast, despite repeated tries to teach it.
A large box of check stubs (the size of punch cards) needs to be sorted. It looks like playing solitaire on a large table—deal out, stack up, repeat.
In general
30 years ago, I did notice what you’re asking about: the ideas transfer to physical systems quite directly because there are relative costs of comparisons and handling records, and levels of caching.
Going beyond well-understood equivalents
I recall an essay about your topic, and it brought up the spaghetti sort. You trim a length of dried noodle to indicate the key value, and label it with the record ID. This is O(n), simply processing each item once.
Then you grab the bundle and tap one end on the table. They align on the bottom edges, and they are now sorted. You can trivially take off the longest one, and repeat. The read-out is also O(n).
There are two things going on here in the “real world” that don’t correspond to algorithms. First, aligning the edges is a parallel operation. Every data item is also a processor (the laws of physics apply to it). So, in general, you scale the available processing with n, essentially dividing your classic complexity by a factor on n.
Second, how does aligning the edges accomplish a sort? The real sorting is in the read-out which lets you find the longest in one step, even though you did compare all of them to find the longest. Again, divide by a factor of n, so finding the largest is now O(1).
Another example is using analog computing: a physical model solves the problem “instantly” and the prep work is O(n). In principle the computation is scaling with the number of interacting components, not the number of prepped items. So the computation scales with n². The example I'm thinking of is a weighted multi-factor computation, which was done by drilling holes in a map, hanging weights from strings passing through the holes, and gathering all the strings on a ring.
Sorting is still O(n) total time. That it is faster than that is because of Parallelization.
You could view a centrifuge as a Bucketsort of n atoms, parallelized over n cores(each atom acts as a processor).
You can make sorting faster by parallelization but only by a constant factor because the number of processors is limited, O(n/C) is still O(n) (CPUs have usually < 10 cores and GPUs < 6000)
The centrifuge is not sorting the nodes, it applies applies a force to them then they react in parallel to it.
So if you were to implement a bubble sort where each node is moving itself in parallel up or down based on it's "density", you'd have a centrifuge implementation.
Keep in mind that in the real world you can run a very large amount of parallel tasks where in a computer you can have a maximum of real parallel tasks equals to the number of physical processing units.
In the end, you would also be limited with the access to the list of elements because it cannot be modified simultaneously by two nodes...
Would it be possible with some overhead on each node (some value or
method tacked on to each of the nodes) to 'force' the order of the
list?
When we sort using computer programs we select a property of the values being sorted. That's commonly magnitude of the number or the alphabetical order.
Something like a centrifuge, where only each element cares about its
relative position in space (in relation to other nodes)
This analogy aptly reminds me of simple bubble sort. How smaller numbers bubble up in each iteration. Like your centrifuge logic.
So to answer this, don't we actually do something of that sort in software based sorting?
First of all, you are comparing two different contexts, one is logic(computer) and the other is physics which (so far) is proven that we can model some parts of it using mathematical formulas and we as programmers can use this formulas to simulate (some parts of) physics in the logic work (e.g physics engine in game engine).
Second We have some possibilities in the computer (logic) world that is nearly impossible in physics for example we can access memory and find the exact location of each entity at each time but in physics that is a huge problem Heisenberg's uncertainty principle.
Third If you want to map centrifuges and its operation in real world, to computer world, it is like someone (The God) has given you a super-computer with all the rules of physics applied and you are doing your small sorting in it (using centrifuge) and by saying that your sorting problem was solved in o(n) you are ignoring the huge physics simulation going on in background...
Consider: is "centrifuge sort" really scaling better? Think about what happens as you scale up.
The test tubes have to get longer and longer.
The heavy stuff has to travel further and further to get to the bottom.
The moment of inertia increases, requiring more power and longer times to accelerate up to sorting speed.
It's also worth considering other problems with centrifuge sort. For example, you can only operate on a narrow size scale. A computer sorting algorithm can handle integers from 1 to 2^1024 and beyond, no sweat. Put something that weighs 2^1024 times as much as a hydrogen atom into a centrifuge and, well, that's a black hole and the galaxy has been destroyed. The algorithm failed.
Of course the real answer here is that computational complexity is relative to some computational model, as mentioned in other answer. And "centrifuge sort" doesn't make sense in the context of common computational models, such as the RAM model or the IO model or multitape Turing machines.
Another perspective is that what you're describing with the centrifuge is analogous to what's been called the "spaghetti sort" (https://en.wikipedia.org/wiki/Spaghetti_sort). Say you have a box of uncooked spaghetti rods of varying lengths. Hold them in your fist, and loosen your hand to lower them vertically so the ends are all resting on a horizontal table. Boom! They're sorted by height. O(constant) time. (Or O(n) if you include picking the rods out by height and putting them in a . . . spaghetti rack, I guess?)
You can note there that it's O(constant) in the number of pieces of spaghetti, but, due to the finite speed of sound in spaghetti, it's O(n) in the length of the longest strand. So nothing comes for free.

Ideas for heuristically solving travelling salesman with extra constraints

I'm trying to come up with a fast and reasonably optimal algorithm to solve the following TSP/hamiltonian-path-like problem:
A delivery vehicle has a number of pickups and dropoffs it needs to
perform:
For each delivery, the pickup needs to come before the
dropoff.
The vehicle is quite small and the packages vary in size.
The total carriage cannot exceed some upper bound (e.g. 1 cubic
metre). Each delivery has a deadline.
The planner can run mid-route, so the vehicle will begin with a number of jobs already picked up and some capacity already taken up.
A near-optimal solution should minimise the total cost (for simplicity, distance) between each waypoint. If a solution does not exist because of the time constraints, I need to find a solution that has the fewest number of late deliveries. Some illustrations of an example problem and a non-optimal, but valid solution:
I am currently using a greedy best first search with backtracking bounded to 100 branches. If it fails to find a solution with on-time deliveries, I randomly generate as many as I can in one second (the most computational time I can spare) and pick the one with the fewest number of late deliveries. I have looked into linear programming but can't get my head around it - plus I would think it would be inappropriate given it needs to be run very frequently. I've also tried algorithms that require mutating the tour, but the issue is mutating a tour nearly always makes it invalid due to capacity constraints and precedence. Can anyone think of a better heuristic approach to solving this problem? Many thanks!
Safe Moves
Here are some ideas for safely mutating an existing feasible solution:
Any two consecutive stops can always be swapped if they are both pickups, or both deliveries. This is obviously true for the "both deliveries" case; for the "both pickups" case: if you had room to pick up A, then pick up B without delivering anything in between, then you have room to pick up B first, then pick up A. (In fact a more general rule is possible: In any pure-delivery or pure-pickup sequence of consecutive stops, the stops can be rearranged arbitrarily. But enumerating all the possibilities might become prohibitive for long sequences, and you should be able to get most of the benefit by considering just pairs.)
A pickup of A can be swapped with any later delivery of something else B, provided that A's original pickup comes after B was picked up, and A's own delivery comes after B's original delivery. In the special case where the pickup of A is immediately followed by the delivery of B, they can always be swapped.
If there is a delivery of an item of size d followed by a pickup of an item of size p, then they can be swapped provided that there is enough extra room: specifically, provided that f >= p, where f is the free space available before the delivery. (We already know that f + d >= p, otherwise the original schedule wouldn't be feasible -- this is a hint to look for small deliveries to apply this rule to.)
If you are starting from purely randomly generated schedules, then simply trying all possible moves, greedily choosing the best, applying it and then repeating until no more moves yield an improvement should give you a big quality boost!
Scoring Solutions
It's very useful to have a way to score a solution, so that they can be ordered. The nice thing about a score is that it's easy to incorporate levels of importance: just as the first digit of a two-digit number is more important than the second digit, you can design the score so that more important things (e.g. deadline violations) receive a much greater weight than less important things (e.g. total travel time or distance). I would suggest something like 1000 * num_deadline_violations + total_travel_time. (This assumes of course that total_travel_time is in units that will stay beneath 1000.) We would then try to minimise this.
Managing Solutions
Instead of taking one solution and trying all the above possible moves on it, I would instead suggest using a pool of k solutions (say, k = 10000) stored in a min-heap. This allows you to extract the best solution in the pool in O(log k) time, and to insert new solutions in the same time.
You could initially populate the pool with randomly generated feasible solutions; then on each step, you would extract the best solution in the pool, try all possible moves on it to generate child solutions, and insert any child solutions that are better than their parent back into the pool. Whenever the pool doubles in size, pull out the first (i.e. best) k solutions and make a new min-heap with them, discarding the old one. (Performing this step after the heap grows to a constant multiple of its original size like this has the nice property of leaving the amortised time complexity unchanged.)
It can happen that some move on solution X produces a child solution Y that is already in the pool. This wastes memory, which is unfortunate, but one nice property of the min-heap approach is that you can at least handle these duplicates cheaply when they arrive at the front of the heap: all duplicates will have identical scores, so they will all appear consecutively when extracting solutions from the top of the heap. Thus to avoid having duplicate solutions generate duplicate children "down through the generations", it suffices to check that the new top of the heap is different from the just-extracted solution, and keep extracting and discarding solutions until this holds.
A note on keeping worse solutions: It might seem that it could be worthwhile keeping child solutions even if they are slightly worse than their parent, and indeed this may be useful (or even necessary to find the absolute optimal solution), but doing so has a nasty consequence: it means that it's possible to cycle from one solution to its child and back again (or possibly a longer cycle). This wastes CPU time on solutions we have already visited.
You are basically combining the Knapsack Problem with the Travelling Salesman Problem.
Your main problem here seems to be actually the Knapsack Problem, rather then the Travelling Salesman Problem, since it has the one hard restriction (maximum delivery volume). Maybe try to combine the solutions for the Knapsack Problem with the Travelling Salesman.
If you really only have one second max for calculations a greedy algorithm with backtracking might actually be one of the best solutions that you can get.

Finding the average of large list of numbers

Came across this interview question.
Write an algorithm to find the mean(average) of a large list. This
list could contain trillions or quadrillions of number. Each number is
manageable in hundreds, thousands or millions.
Googling it gave me all Median of Medians solutions. How should I approach this problem?
Is divide and conquer enough to deal with trillions of number?
How to deal with the list of the such a large size?
If the size of the list is computable, it's really just a matter of how much memory you have available, how long it's supposed to take and how simple the algorithm is supposed to be.
Basically, you can just add everything up and divide by the size.
If you don't have enough memory, dividing first might work (Note that you will probably lose some precision that way).
Another approach would be to recursively split the list into 2 halves and calculating the mean of the sublists' means. Your recursion termination condition is a list size of 1, in which case the mean is simply the only element of the list. If you encounter a list of odd size, make either the first or second sublist longer, this is pretty much arbitrary and doesn't even have to be consistent.
If, however, you list is so giant that its size can't be computed, there's no way to split it into 2 sublists. In that case, the recursive approach works pretty much the other way around. Instead of splitting into 2 lists with n/2 elements, you split into n/2 lists with 2 elements (or rather, calculate their mean immediately). So basically, you calculate the mean of elements 1 and 2, that becomes you new element 1. the mean of 3 and 4 is your new second element, and so on. Then apply the same algorithm to the new list until only 1 element remains. If you encounter a list of odd size, either add an element at the end or ignore the last one. If you add one, you should try to get as close as possible to your expected mean.
While this won't calculate the mean mathematically exactly, for lists of that size, it will be sufficiently close. This is pretty much a mean of means approach. You could also go the median of medians route, in which case you select the median of sublists recursively. The same principles apply, but you will generally want to get an odd number.
You could even combine the approaches and calculate the mean if your list is of even size and the median if it's of odd size. Doing this over many recursion steps will generate a pretty accurate result.
First of all, this is an interview question. The problem as stated would not arise in practice. Also, the question as stated here is imprecise. That is probably deliberate. (They want to see how you deal with solving an imprecisely specified problem.)
Write an algorithm to find the mean(average) of a large list.
The word "find" is rubbery. It could mean calculate (to some precision) or it could mean estimate.
The phrase "large list" is rubbery. If could mean a list or array data structure in memory, or the "list" could be the result of a database query, the contents of a file or files.
There is no mention of the hardware constraints on the system where this will be implemented.
So the first thing >>I<< would do would be to try to narrow the scope by asking some questions of the interviewer.
But assuming that you can't, then a complete answer would need to cover the following points:
The dataset probably won't fit in memory at the same time. (But if it does, then that is good.)
Calculating the average of N numbers is O(N) if you do it serially. For N this size, it could be an intractable problem.
An alternative is to split into sublists of equals size and calculate the averages, and the average of the averages. In theory, this gives you O(N/P) where P is the number of partitions. The parallelism could be implemented with multiple threads, with multiple processes on the same machine, or distributed.
In practice, the limiting factors are going to be computational, memory and/or I/O bandwidth. A parallel solution will be effective if you can address these limits. For example, you need to balance the problem of each "worker" having uncontended access to its "sublist" versus the problem of making copies of the data so that that can happen.
If the list is represented in a way that allows sampling, then you can estimate the average without looking at the entire dataset. In fact, this could be O(C) depending on how you sample. But there is a risk that your sample will be unrepresentative, and the average will be too inaccurate.
In all cases doing calculations, you need to guard against (integer) overflow and (floating point) rounding errors. Especially while calculating the sums.
It would be worthwhile discussing how you would solve this with a "big data" platform (e.g. Hadoop) and the limitations of that approach (e.g. time taken to load up the data ...)

sorting algorithm to keep sort position numbers updated

Every once in a while I must deal with a list of elements that the user can sort manually.
In most cases I try to rely on a model using an order sensitive container, however this is not always possible and resort to adding a position field to my data. This position field is a double type, therefore I can always calculate a position between two numbers. However this is not ideal, because I am concerned about reaching an edge case where I do not have enough numerical precision to continue inserting between two numbers.
I am having doubts about the best approach to maintain my position numbers. The first thought is traversing all the rows and give them a round number after every insertion, like:
Right after dropping a row between 2 and 3:
1 2 2.5 3 4 5
After position numbers update:
1 2 3 4 5 6
That of course, might get heavy if I have a high number of entries. Not specially in memory, but to store all new values back to the disk/database. I usually work with some type of ORM and mobile software. Updating all the codes will pull out of disk every object and will set them as dirty, leading to a re-verification of all the related validation rules of my data model.
I could also wait until the precision is not enough to calculate a number between two positions. However the user experience would be bad, since the same operation will no longer require the same amount of time.
I believe that there is an standard algorithm for these cases that regularly and consistently keep the position numbers updated, or just some of them. Ideally it should be O(log n), with no big time differences between the worst and best cases.
Being honest I also think that anything that must be user/sorted, cannot grow as large as to become a real problem in its worst case. The edge case seems also to be extremely rare, even more if I search a solution pushing the border numbers. However I still believe that there is an standard well known solution for this problem which I am not aware of, and I would like to learn about it.
Second try.
Consider the full range of position values, say 0 -> 1000
The first item we insert should have a position of 500. Our list is now :
(0) -> 500 -> (1000).
If you insert another item at first position, we end up with :
(0) -> 250 -> 500 -> (1000).
If we keep inserting items at first position, we gonna have a problem, as our ranges are not equally balanced and... Wait... balanced ? Doesn't it sounds like a binary tree problem !?
Basically, you store your list as a binary tree. When inserting a node, you assign it a position according to surrounding nodes. When your tree become unbalanced, you rotate nodes to make it balanced again and you recompute position for rotated nodes !
So :
Most of the time, adding a node will not require to change position of other nodes.
When balancing is required, only a subset of your items will be changed.
It's O(log n) !
EDIT
If the user is actually sorting the list manually, then is there really any need to worry about taking O(n) to record the new order? It's O(n) in any case just to display the list to the user.
This not really answers the question but...
As you talked about "adding a position field to your data", I suppose that your data store is a relational database and that your data has some kind of identifier.
So maybe you can implement a doubly linked list by adding a previous_data_id and next_data_id to your data. Insert/move/remove operations thus are O(1).
Loading such a collection from a database is rather easy:
Fetch each item and add them to a map with their id as key.
For each item connect it with its previous and next item.
Starting with the first item (previous_data_id is undefined) follow the chain and add them to a list.
After some days with no valid answer. This is my theory:
The real challenge here is a practical solution. Maybe there is a mathematical correct solution, but every day that goes by, it seems that the implementation would be of a great complexity. A good solution should not only be mathematically correct, but also balanced with the nature the problem, the low chances to meet it, and its minor implications. Like how useless it could be killing flies with bullets, although extremely effective.
I am starting to believe that a good answer could be: to the hell with the right solution, leave it like one line calculation and live with the rare case where sorting of two elements might fail. It is not worth to increase complexity and invest time or money in such nity-picky problem, so rare, that causes no data damage, just a temporal UX glitch.

Find "minimum branch" of a tree - solution to warehouse problem

I have a warehouse in which I keep goods. Each article occupies given volume of warehouse space and costs give amount of dollars.
Now, because warehouse is full and I'm expecting new delivery, I have to free some space - no less than the new delivery will occupy, but I also have to minimize my loses. In other words, I have to empty at least X cubic meters of warehouse space by throwing out some articles, making sure that value of those articles is the minimum value possible.
Example:
If X=10m3, then I prefer to throw out item that occupies 20m3 and is worth 1000$ than item that occupies exactly 10m3 but is worth 2000$.
Of course in the real calculation it will be necessary to throw out more than one item, probably 5, 10 or maybe even 20.
What I'm thinking of is to represent the above problem as a tree, with nodes representing volume of emptied space and edges representing lost value, then searching for nodes that have value greater or equal to X and then choosing the node that has the lowest lost value along the edges from tree root to that node.
What I'm not sure if this is good approach because for e.g. 100 items in warehouse full tree would have 100 nodes on first depth level, 100*99=9900 nodes on second level etc, so this quickly starts to be prohitibive.
Is there better approach, or maybe even some proven algorithm (that I'm not aware of) for such kind of problems?
This is known as the Knapsack problem.
Change your problem a little to make it compliant with the knapsack: list all the items you have in the warehouse + all the items you wish to add. Search for the best bang for the buck.
Well, in this case it means you would not try to clear space in your warehouse for junk, which I don't think is too stupid.
Ah, in case you didn't follow the link, it NP-complete, so you're in for a word of hurt if you want the best solution :)

Resources