Balanced Partition greedy approach - algorithm

I was looking at the balanced partitioning problem here and here (problem 7).
The problem basically asks to partition a given array of numbers into 2 subsets (S1 and S2) such that absolute difference between the sums of numbers is S1 ans S2 |sum(S1) - sum(S2)| needs to be minimum. One thing I didn't understand is why doesn't anyone suggest greedy approach:
def balanced_partition(lst):
idx = 0
S1 = 0
S2 = 0
result_partition=[None]*len(lst)
while idx < len(lst):
new_S1 = S1 + lst[idx]
new_S2 = S2 + lst[idx]
if abs(new_S1 - S2) < abs(new_S2 - S1):
result_partition[idx] = 1
S1 = new_S1
else:
result_partition[idx] = 2
S2 = new_S2
idx += 1
print("final sums s1 = {S1} and s2 = {S2} ".format(S1=S1, S2=S2))
return result_partition
What is wrong with my approach? It seems to pass all the test cases I can come up with.

The simple counterexample is [1,1,1,1,1,1,6]. The greedy approach will spread the ones between the two sets, while the optimal solution is [1,1,1,1,1,1],[6].

There is nothing wrong with your implementation and approach. However if you consider all subsets in this particular problem, you may find a better answer than the greedy output. Even in the wiki page that you shared has some examples.
Probably you already know the difference between those two approaches. Although, greedy algorithm will always give you a pretty good result, so close or maybe equal to the best one, you have to consider all options to be sure. Dynamic programming approach checks all the possible subsets in a way. As it saves results from previously computed sub-problems, it is faster than brute forcing basically.
The question is when to use greedy or dynamic programming approach. I have done some competitive programming and when I see a DP problem (problems like partitioning, subset sum, knapsack and so on), I sometimes come up with a greedy solution immediately because most of the times they are more obvious. People use greedy approach all the time in daily life. Before implementing, I test my algorithm with examples and if I convince myself that this is right approach, I implement it. It is kinda intuitive in some way.
If you find a test case that should have a better answer, most probably it means you have to find a DP solution. If you got WA from judge system, it means you haven't find good test cases but that's okay you don't have to find that exact test case because it won't help you to find a better solution.

Related

Dynamic Programming Solution for Activity-selection

In 16.1 An activity-selection problem of Introduction to Algorithm, the dynamic programming solution for this problem was given as
c[i, j] = 0 if S(i, j) is empty
c[i, j] = max { c[i, k] + c[k, j] + 1 } if S(i, j) is not empty
where S(i, j) denotes the set of activities that start after activity a(i) finishes and that finish before activity a(j) starts, and c[i, j] denotes the size of an optimal solution for the set S(i, j)
However, I am thinking of another simpler solution
c[i] = max { c[i - 1], c[f(i)] + 1 }
where f(i) gives the activity that is compatible with a(i) and has the max finish time and finishes before a(i) starts.
Will this work? If yes, why the author provides this complex solution. If not, what am I missing?
I think you are missing many details of designing the dp solution.
What is the initial value?
What is the base case?
what happens if there are several activities compatible with a(i) with same finishing time?
When designing a dp solution, one of the properties needed is optimal substructure
The computing order of a particular state (i.e. c[i]) is important, it can only be computed by its subproblems. Your solution does not meet this requirement, as when you computing c[i], you have to computer c[j] first with j = f(i), let's assume j > i (or even j = i+1) , then you have to compute c[i] before computing c[j]! So c[i] depends on c[j] while c[j] depends on c[i] ==> not correct
Another example very similar to this question is Matrix chain mutiplication
You may want to have a look :)
Edit:
After seeing you edit the question, then here's my response:
Assuming you can precompute f(i) in reasonable time (which obviously can), your solution is correct as it IS the greedy solution as other answers told you.
The reason why it works is quite straight forward, literally speaking,
until the i-th activity, you either choose activity i (thats the c[f(i)]+1 part) or not choose it (the c[i-1]) part
You can try to construct a formal proof as well, the correctness of a greedy method can usually be proofed by contradiction (roughly speaking, you can try to see why it is NOT possible to have a larger set other than c[i-1] if you do not choose activity i, similar for the case that you choose activity i)
To answer your question about why writer demonstrate the dp solution, I think it's out of programming context, but my thought is the user is trying to demonstrate two different ways to solve a problem, and furthermore to illustrate an idea here: given a problem which can be solved by greedy method, it can also be solved by dp but IT IS OVERKILLING.
Then the writer try to help the reader to recognize the difference between Greedy and dp as they are quite similar to a new learner. And that's why the writer first give the DP solution to show the pain, then the greedy solution, lastly a paragraph Greedy versus DP in Page 382
So TL;DR: Your solution is correct as it is basically the greedy method to solve the problem, and of course it is much easier than the DP solution given in the book, as this IS the point the book would like to illustrate.
A quote from the book at P.382: ...One might be temped to generate a dp solution to a problem when a greedy solution suffices, or one might mistakenly think that a greedy solution suffices...when a dp solution is required...
Will this work?
Yes, that will work too.
why the author provides this complex solution.
That wasn't their proposed solution, but a part of the analysis of the problem. In the next paragraph after the equation you cited the authors say:
But we would be overlooking another important characteristic of the
activity-selection problem that we can use to great advantage.
The final solution (Greedy-Activity-Selector) is similar to yours, but even simpler.
Their point, as I understand it, is that a DP solution can be built almost mechanically (as described in the chapter 15.3), without considering the specifics of that particular problem, but coming up with a better algorithm requires some insight into the problem beyond the optimal substructure.
Your solution relies on the theorem 16.1, but once the theorem is proven, it doesn't make sense to create another DP algorithm, because you already know enough about the problem to create a simpler greedy algorithm.

Find the priority function / alphabet order for extreme higher order elements relation

This question is an extension to the following one. The difference is that now our function to optimize will have higher order relations between elements:
We have an array of elements a1,a2,...aN from an alphabet E. Assuming |N| >> |E|.
For each symbol of the alphabet we define an unique integer priority = V(sym). Let's define V{i} := V(symbol(ai)) for the simplicity.
The task is to find a priority function V for which:
Count(i)->MIN | V{i} > V{i+1} <= V{i+2}
In other words, I need to find the priorities / permutation of the alphabet for which the number of positions i, satisfying the condition V{i}>V{i+1}<=V{i+2}, is minimum.
Maximum required abstraction (low priority for me). I guess once the solution model for the initial question is extended to cover the first part of this one, extending it farther (see below) will be easier.
Given a matrix of signs B of size MxK (basically B[i,j] is from the set {<,>,<=,>=}), find the priority function V for which:
Sum(for all j in range [1,M]) {Count(i)}->EXTREMUM | V{i} B[j,1] V{i+1} B[j,2] ... B[j,K] V{i+K}
As an example, find the priority function V, for which the number of i, satisfying V{i}<V{i+1}<V{i+2} or V{i}>V{i+1}>V{i+2}, is minimum.
My intuition is that all variations on this problem will prove to be NP-hard. So I'd begin looking for heuristics that produce reasonable answers. This may involve some trial and error.
A simplistic approach is to write down a possible permutation. And then try possible swaps until you've arrived at a local minimum. Try several times, and pick the best answer.
Simulated annealing provides a more sophisticated version of this approach, see http://en.wikipedia.org/wiki/Simulated_annealing for a description. It may take some experimentation to find a set of parameters that seems to converge relatively well.
Another idea is to look for a genetic algorithm. Based on a quick Google search it looks like the standard way to do this is to try to turn an NP-complete problem into a SAT problem, and then use a genetic algorithm on that problem. This approach would require turning this into a SAT problem in some reasonable way. Unfortunately it is not obvious to me how one would go about doing this reduction. Indeed in the first version that you had, your problem was closely connected to a classic NP-hard problem. The fact that it is labeled NP-hard rather than NP-complete is evidence that people haven't found a good way to transform it into a SAT problem. So if it isn't obvious how to turn the simple version into a SAT problem, then you are unlikely to convert the hard problem either.
But you could still try some variation on genetic algorithms. Mutation is pretty simple, just swap some elements around. One way to combine elements would be to take 3 permutations and use quicksort to find the combination as follows: take a random pivot, and then use "majority wins" to bucket elements into bigger and smaller. Sort each half in the same way.
I'm sorry that I can't just give you an approach and say, "This should work." You've got what looks like an open-ended research project, and the best I can do is give you some ideas about things you can try that might work reasonably well.

About an exercise appearing in TAOCP volume one's "Notes on the Exercises"

There is a question in TAOCP vol 1, in "Notes on Exercises" section, which goes something like:
"Prove that 13^3 = 2197. Generalize your answer. (This is a horrible kind of problem that the author has tried to avoid)."
Questions:
How would you actually go about proving this ? (Direct multiplication is one way, another way could be using formula of (a+b)^3). Does the solution requires using some method that will allow us to make some kind of generalization ?
What is the generalization here ?
Why is this a horrible kind of problem ?
What are some other kind of similar horrible problems that you are aware of ?
Appreciate any answers.
P.S. I apologize if the statement of problem above makes it look like a homework problem, but its not. Request people to not tag this as a homework problem, so that more people can give answers.
I'd guess that he's alluding to perhaps proving it starting from just the Peano axioms. Then constructing the integers, and going on to formally show that 13^3 = 2197 is a natural, logical conclusion that flows from the definition of exponentiation.
We could generalize to show that given an a and b, there exists some integer c, that is a^b.
This is a horrible kind of a problem because most people find it uninteresting.
Similar sorts of problems can be found in a course on analysis (along with some greatly more interesting).
I initially considered it as follows:
n3 = n * n * n
logn(n3) = logn(n*n*n)
logn(n3) = logn(n) + logn(n) + logn(n)
3 = 1 + 1 + 1
3 = 3
This seems fairly circular in its use of logarithmic identities, but given where I'm at in my algorithms research, it was oddly comforting.
Got stuck at the same exercise and 'solved' it this way:
a^b = mult(i=1 to b) a
After a bit of thinking I came to the conclusion that this is a prime factorization (both 13 and 3 are primes). Look up fermat's little theorem.
(I know, it's an old thread but maybe this'll help somebody who is also seeking an answer to this execise.)

What's the most insidious way to pose this problem?

My best shot so far:
A delivery vehicle needs to make a series of deliveries (d1,d2,...dn), and can do so in any order--in other words, all the possible permutations of the set D = {d1,d2,...dn} are valid solutions--but the particular solution needs to be determined before it leaves the base station at one end of the route (imagine that the packages need to be loaded in the vehicle LIFO, for example).
Further, the cost of the various permutations is not the same. It can be computed as the sum of the squares of distance traveled between di -1 and di, where d0 is taken to be the base station, with the caveat that any segment that involves a change of direction costs 3 times as much (imagine this is going on on a railroad or a pneumatic tube, and backing up disrupts other traffic).
Given the set of deliveries D represented as their distance from the base station (so abs(di-dj) is the distance between two deliveries) and an iterator permutations(D) which will produce each permutation in succession, find a permutation which has a cost less than or equal to that of any other permutation.
Now, a direct implementation from this description might lead to code like this:
function Cost(D) ...
function Best_order(D)
for D1 in permutations(D)
Found = true
for D2 in permutations(D)
Found = false if cost(D2) > cost(D1)
return D1 if Found
Which is O(n*n!^2), e.g. pretty awful--especially compared to the O(n log(n)) someone with insight would find, by simply sorting D.
My question: can you come up with a plausible problem description which would naturally lead the unwary into a worse (or differently awful) implementation of a sorting algorithm?
I assume you're using this question for an interview to see if the applicant can notice a simple solution in a seemingly complex question.
[This assumption is incorrect -- MarkusQ]
You give too much information.
The key to solving this is realizing that the points are in one dimension and that a sort is all that is required. To make this question more difficult hide this fact as much as possible.
The biggest clue is the distance formula. It introduces a penalty for changing directions. The first thing an that comes to my mind is minimizing this penalty. To remove the penalty I have to order them in a certain direction, this ordering is the natural sort order.
I would remove the penalty for changing directions, it's too much of a give away.
Another major clue is the input values to the algorithm: a list of integers. Give them a list of permutations, or even all permutations. That sets them up to thinking that a O(n!) algorithm might actually be expected.
I would phrase it as:
Given a list of all possible
permutations of n delivery locations,
where each permutation of deliveries
(d1, d2, ...,
dn) has a cost defined by:
Return permutation P such that the
cost of P is less than or equal to any
other permutation.
All that really needs to be done is read in the first permutation and sort it.
If they construct a single loop to compare the costs ask them what the big-o runtime of their algorithm is where n is the number of delivery locations (Another trap).
This isn't a direct answer, but I think more clarification is needed.
Is di allowed to be negative? If so, sorting alone is not enough, as far as I can see.
For example:
d0 = 0
deliveries = (-1,1,1,2)
It seems the optimal path in this case would be 1 > 2 > 1 > -1.
Edit: This might not actually be the optimal path, but it illustrates the point.
YOu could rephrase it, having first found the optimal solution, as
"Give me a proof that the following convination is the most optimal for the following set of rules, where optimal means the smallest number results from the sum of all stage costs, taking into account that all stages (A..Z) need to be present once and once only.
Convination:
A->C->D->Y->P->...->N
Stage costs:
A->B = 5,
B->A = 3,
A->C = 2,
C->A = 4,
...
...
...
Y->Z = 7,
Z->Y = 24."
That ought to keep someone busy for a while.
This reminds me of the Knapsack problem, more than the Traveling Salesman. But the Knapsack is also an NP-Hard problem, so you might be able to fool people to think up an over complex solution using dynamic programming if they correlate your problem with the Knapsack. Where the basic problem is:
can a value of at least V be achieved
without exceeding the weight W?
Now the problem is a fairly good solution can be found when V is unique, your distances, as such:
The knapsack problem with each type of
item j having a distinct value per
unit of weight (vj = pj/wj) is
considered one of the easiest
NP-complete problems. Indeed empirical
complexity is of the order of O((log
n)2) and very large problems can be
solved very quickly, e.g. in 2003 the
average time required to solve
instances with n = 10,000 was below 14
milliseconds using commodity personal
computers1.
So you might want to state that several stops/packages might share the same vj, inviting people to think about the really hard solution to:
However in the
degenerate case of multiple items
sharing the same value vj it becomes
much more difficult with the extreme
case where vj = constant being the
subset sum problem with a complexity
of O(2N/2N).
So if you replace the weight per value to distance per value, and state that several distances might actually share the same values, degenerate, some folk might fall in this trap.
Isn't this just the (NP-Hard) Travelling Salesman Problem? It doesn't seem likely that you're going to make it much harder.
Maybe phrasing the problem so that the actual algorithm is unclear - e.g. by describing the paths as single-rail railway lines so the person would have to infer from domain knowledge that backtracking is more costly.
What about describing the question in such a way that someone is tempted to do recursive comparisions - e.g. "can you speed up the algorithm by using the optimum max subset of your best (so far) results"?
BTW, what's the purpose of this - it sounds like the intent is to torture interviewees.
You need to be clearer on whether the delivery truck has to return to base (making it a round trip), or not. If the truck does return, then a simple sort does not produce the shortest route, because the square of the return from the furthest point to base costs so much. Missing some hops on the way 'out' and using them on the way back turns out to be cheaper.
If you trick someone into a bad answer (for example, by not giving them all the information) then is it their foolishness or your deception that has caused it?
How great is the wisdom of the wise, if they heed not their ego's lies?

Your favourite algorithm and the lesson it taught you [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What algorithm taught you the most about programming or a specific language feature?
We have all had those moments where all of a sudden we know, just know, we have learned an important lesson for the future based on finally understanding an algorithm written by a programmer a couple of steps up the evolutionary ladder. Whose ideas and code had the magic touch on you?
General algorithms:
Quicksort (and it's average complexity analysis), shows that randomizing your input can be a good thing!;
balanced trees (AVL trees for example), a neat way to balance search/insertion costs;
Dijkstra and Ford-Fulkerson algorithms on graphs (I like the fact that the second one has many applications);
the LZ* family of compression algorithms (LZW for example), data compression sounded kind of magic to me until I discovered it (a long time ago :) );
the FFT, ubiquitous (re-used in so many other algorithms);
the simplex algorithm, ubiquitous as well.
Numerical related:
Euclid's algorithm to compute the gcd of two integers: one of the first algorithms, simple and elegant, powerful, has lots of generalizations;
fast multiplication of integers (Cooley-Tukey for example);
Newton iterations to invert / find a root, a very powerful meta-algorithm.
Number theory-related:
AGM-related algorithms (examples): leads to very simple and elegant algorithms to compute pi (and much more!), though the theory is quite profound (Gauss introduced elliptic functions and modular forms from it, so you can say that it gave birth to algebraic geometry...);
the number field sieve (for integer factorization): very complicated, but quite a nice theoretical result (this also goes for the AKS algorithm, which proved that PRIMES is in P).
I also enjoyed studying quantum computing (Shor and Deutsch-Josza algorithms for example): this teaches you to think out of the box.
As you can see, I'm a bit biased towards maths-oriented algorithms :)
"To iterate is human, to recurse divine" - quoted in 1989 at college.
P.S. Posted by Woodgnome while waiting for invite to join
Floyd-Warshall all-pairs shortest paths algorithm
procedure FloydWarshall ()
for k := 1 to n
for i := 1 to n
for j := 1 to n
path[i][j] = min ( path[i][j], path[i][k]+path[k][j] );
Here's why it's cool: when you first learn about the shortest-path problem in your graph theory course, you probably start with Dijkstra's algorithm that solves single-source shortest path. It's quite complicated at first, but then you get over it, and you fully understood it.
Then the teacher says "Now we want to solve the same problem but for ALL sources". You think to yourself, "Oh god, this is going to be a much harder problem! It's going to be at least N times more complicated than Dijkstra's algorithm!!!".
Then the teacher gives you Floyd-Warshall. And your mind explodes. Then you start to tear up at how beautifully simple the algorithm is. It's just a triply-nested loop. It only uses a simple array for its data structure.
The most eye-opening part for me is the following realization: say you have a solution for problem A. Then you have a bigger "superproblem" B which contains problem A. The solution to problem B may in fact be simpler than the solution to problem A.
This one might sound trivial but it was a revelation for me at the time.
I was in my very first programming class(VB6) and the Prof had just taught us about random numbers and he gave the following instructions: "Create a virtual lottery machine. Imagine a glass ball full of 100 ping pong balls marked 0 to 99. Pick them randomly and display their number until they have all been selected, no duplicates."
Everyone else wrote their program like this: Pick a ball, put its number into an "already selected list" and then pick another ball. Check to see if its already selected, if so pick another ball, if not put its number on the "already selected list" etc....
Of course by the end they were making hundreds of comparisons to find the few balls that had not already been picked. It was like throwing the balls back into the jar after selecting them. My revelation was to throw balls away after picking.
I know this sounds mind-numbingly obvious but this was the moment that the "programming switch" got flipped in my head. This was the moment that programming went from trying to learn a strange foreign language to trying to figure out an enjoyable puzzle. And once I made that mental connection between programming and fun there was really no stopping me.
Huffman coding would be mine, I had originally made my own dumb version by minimizing the number of bits to encode text from 8 down to less, but had not thought about variable number of bits depending on frequency. Then I found the huffman coding described in an article in a magazine and it opened up lots of new possibilities.
Quicksort. It showed me that recursion can be powerful and useful.
Bresenham's line drawing algorithm got me interested in realtime graphics rendering. This can be used to render filled polygons, like triangles, for things like 3D model rendering.
Recursive Descent Parsing - I remember being very impressed how such simple code could do something so seemingly complex.
Quicksort in Haskell:
qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
Although I couldn'd write Haskell at the time, I did understand this code and with it recursion and the quicksort algorithm. It just made click and there it was...
The iterative algorithm for Fibonacci, because for me it nailed down the fact that the most elegant code (in this case, the recursive version) is not necessarily the most efficient.
To elaborate- The "fib(10) = fib(9) + fib(8)" approach means that fib(9) will be evaluated to fib(8) + fib(7). So evaluation of fib(8) (and therefor fib7, fib6) will all be evaluated twice.
The iterative method, (curr = prev1 + prev2 in a forloop) does not tree out this way, nor does it take as much memory since it's only 3 transient variables, instead of n frames in the recursion stack.
I tend to strive for simple, elegant code when I'm programming, but this is the algorithm that helped me realize that this isn't the end-all-be-all for writing good software, and that ultimately the end users don't care how your code looks.
For some reason I like the Schwartzian transform
#sorted = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, foo($_)] }
#unsorted;
Where foo($) represents a compute-intensive expression that takes $ (each item of the list in turn) and produces the corresponding value that is to be compared in its sake.
Minimax taught me that chess programs aren't smart, they can just think more moves ahead than you can.
I don't know if this qualifies as an algorithm, or just a classic hack. In either case, it helped to get me to start thinking outside the box.
Swap 2 integers without using an intermediate variable (in C++)
void InPlaceSwap (int& a, int &b) {
a ^= b;
b ^= a;
a ^= b;
}
Quicksort: Until I got to college, I had never questioned whether brute force Bubble Sort was the most efficient way to sort. It just seemed intuitively obvious. But being exposed to non-obvious solutions like Quicksort taught me to look past the obvious solutions to see if something better is available.
For me it's the weak-heapsort algorithm because it shows (1) how much a wise chosen data structure (and the algorithms working on it) can influence the performance and (2) that fascinating things can be discovered even in old, well-known things. (weak-heapsort is the best variant of all heap sorts, which was proven eight years later.)
This is a slow one :)
I learned lots about both C and computers in general by understanding Duffs Device and XOR swaps
EDIT:
#Jason Z, that's my XOR swap :) cool isn't it.
For some reason Bubble Sort has always stood out to me. Not because it's elegant or good just because it had/has a goofy name I suppose.
The iterative algorithm for Fibonacci, because for me it nailed down the fact that the most elegant code (in this case, the recursive version) is not necessarily the most efficient.
The iterative method, (curr = prev1 + prev2 in a forloop) does not tree out this way, nor does it take as much memory since it's only 3 transient variables, instead of n frames in the recursion stack.
You know that fibonacci has a closed form solution that allows direct computation of the result in a fixed number of steps, right? Namely, (phin - (1 - phi)n) / sqrt(5). It always strikes me as somewhat remarkable that this should yield an integer, but it does.
phi is the golden ratio, of course; (1 + sqrt(5)) / 2.
I don't have a favourite -- there are so many beautiful ones to pick from -- but one I've always found intriguing is the Bailey–Borwein–Plouffe (BBP) formula, which enables you to calculate an arbitrary digit of pi without knowledge about the preceding digits.
RSA introduced me to the world of modular arithmetic, which can be used to solve a surprising number of interesting problems!
Hasn't taught me much, but the Johnson–Trotter Algorithm never fails to blow my mind.
Binary decision diagrams, though formally not an algorithm but a datastructure, lead to elegant and minimal solutions for various sorts of (boolean) logic problems. They were invented and developped to minimise the gate count in chip-design, and can be viewed as one of the fundaments of the silicon revolution. The resulting algorithms are amazingly simple.
What they taught me:
a compact representation of any problem is important; small is beautiful
a small set of constraints/reductions applied recursively can be used to accomplish this
for problems with symmetries, tranformation to a canonical form should be the first step to consider
not every piece of literature is read. Knuth found out about BDD's several years after their invention/introduction. (and spent almost a year investigating them)
For me, the simple swap in Kelly & Pohl's A Book on C to demonstrate call-by-reference flipped me out when I first saw it. I looked at that, and pointers snapped into place. Verbatim. . .
void swap(int *p, int *q)
{
int temp;
temp = *p;
*p = *q;
*q = temp;
}
The Towers of Hanoi algorithm is one of the most beautiful algorithms. It shows how you can use recursion to solve a problem in a much more elegant fashion than the iterative method.
Alternatively, the recursion algorithm for Fibonacci series and calculating powers of a number demonstrate the reverse situation of recursive algorithm being used for the sake of recursion instead of providing good value.
An algorithm that generates a list of primes by comparing each number to the current list of primes, adding it if it's not found, and returning the list of primes at the end. Mind-bending in several ways, not the least of which being the idea of using the partially-completed output as the primary search criteria.
Storing two pointers in a single word for a doubly linked list tought me the lesson that you can do very bad things in C indeed (with which a conservative GC will have lots of trouble).
The most proud I've been of a solution was writing something very similar to the DisplayTag package. It taught me a lot about code design, maintainability, and reuse. I wrote it well before DisplayTag, and it was sunk into an NDA agreement, so I couldn't open source it, but I can still speak gushingly about that one in job interviews.
Map/Reduce. Two simple concepts that fit together to make a load of data-processing tasks easier to parallelize.
Oh... and it's only the basis of massively-parallel indexing:
http://labs.google.com/papers/mapreduce.html
Not my favorite, but the Miller Rabin Algorithm for testing primality showed me that being right almost all the time, is good enough almost all the time. (i.e. Don't mistrust a probabilistic algorithm just because it has a probability of being wrong.)
#Krishna Kumar
The bitwise solution is even more fun than the recursive solution.

Resources