Related
I was trying various methods to implement a program that gives the digits of pi sequentially. I tried the Taylor series method, but it proved to converge extremely slowly (when I compared my result with the online values after some time). Anyway, I am trying better algorithms.
So, while writing the program I got stuck on a problem, as with all algorithms: How do I know that the n digits that I've calculated are accurate?
Since I'm the current world record holder for the most digits of pi, I'll add my two cents:
Unless you're actually setting a new world record, the common practice is just to verify the computed digits against the known values. So that's simple enough.
In fact, I have a webpage that lists snippets of digits for the purpose of verifying computations against them: http://www.numberworld.org/digits/Pi/
But when you get into world-record territory, there's nothing to compare against.
Historically, the standard approach for verifying that computed digits are correct is to recompute the digits using a second algorithm. So if either computation goes bad, the digits at the end won't match.
This does typically more than double the amount of time needed (since the second algorithm is usually slower). But it's the only way to verify the computed digits once you've wandered into the uncharted territory of never-before-computed digits and a new world record.
Back in the days where supercomputers were setting the records, two different AGM algorithms were commonly used:
Gauss–Legendre algorithm
Borwein's algorithm
These are both O(N log(N)^2) algorithms that were fairly easy to implement.
However, nowadays, things are a bit different. In the last three world records, instead of performing two computations, we performed only one computation using the fastest known formula (Chudnovsky Formula):
This algorithm is much harder to implement, but it is a lot faster than the AGM algorithms.
Then we verify the binary digits using the BBP formulas for digit extraction.
This formula allows you to compute arbitrary binary digits without computing all the digits before it. So it is used to verify the last few computed binary digits. Therefore it is much faster than a full computation.
The advantage of this is:
Only one expensive computation is needed.
The disadvantage is:
An implementation of the Bailey–Borwein–Plouffe (BBP) formula is needed.
An additional step is needed to verify the radix conversion from binary to decimal.
I've glossed over some details of why verifying the last few digits implies that all the digits are correct. But it is easy to see this since any computation error will propagate to the last digits.
Now this last step (verifying the conversion) is actually fairly important. One of the previous world record holders actually called us out on this because, initially, I didn't give a sufficient description of how it worked.
So I've pulled this snippet from my blog:
N = # of decimal digits desired
p = 64-bit prime number
Compute A using base 10 arithmetic and B using binary arithmetic.
If A = B, then with "extremely high probability", the conversion is correct.
For further reading, see my blog post Pi - 5 Trillion Digits.
Undoubtedly, for your purposes (which I assume is just a programming exercise), the best thing is to check your results against any of the listings of the digits of pi on the web.
And how do we know that those values are correct? Well, I could say that there are computer-science-y ways to prove that an implementation of an algorithm is correct.
More pragmatically, if different people use different algorithms, and they all agree to (pick a number) a thousand (million, whatever) decimal places, that should give you a warm fuzzy feeling that they got it right.
Historically, William Shanks published pi to 707 decimal places in 1873. Poor guy, he made a mistake starting at the 528th decimal place.
Very interestingly, in 1995 an algorithm was published that had the property that would directly calculate the nth digit (base 16) of pi without having to calculate all the previous digits!
Finally, I hope your initial algorithm wasn't pi/4 = 1 - 1/3 + 1/5 - 1/7 + ... That may be the simplest to program, but it's also one of the slowest ways to do so. Check out the pi article on Wikipedia for faster approaches.
You could use multiple approaches and see if they converge to the same answer. Or grab some from the 'net. The Chudnovsky algorithm is usually used as a very fast method of calculating pi. http://www.craig-wood.com/nick/articles/pi-chudnovsky/
The Taylor series is one way to approximate pi. As noted it converges slowly.
The partial sums of the Taylor series can be shown to be within some multiplier of the next term away from the true value of pi.
Other means of approximating pi have similar ways to calculate the max error.
We know this because we can prove it mathematically.
You could try computing sin(pi/2) (or cos(pi/2) for that matter) using the (fairly) quickly converging power series for sin and cos. (Even better: use various doubling formulas to compute nearer x=0 for faster convergence.)
BTW, better than using series for tan(x) is, with computing say cos(x) as a black box (e.g. you could use taylor series as above) is to do root finding via Newton. There certainly are better algorithms out there, but if you don't want to verify tons of digits this should suffice (and it's not that tricky to implement, and you only need a bit of calculus to understand why it works.)
There is an algorithm for digit-wise evaluation of arctan, just to answer the question, pi = 4 arctan 1 :)
I have been trying to implement a locally-weighted logistic regression algorithm in Ruby. As far as I know, no library currently exists for this algorithm, and there is very little information available, so it's been difficult.
My main resource has been the dissertation of Dr. Kan Deng, in which he described the algorithm in what I feel is pretty light detail. My work so far on the library is here.
I've run into trouble when trying to calculate B (beta). From what I understand, B is a (1+d x 1) vector that represents the local weighting for a particular point. After that, pi (the probability of a positive output) for that point is the sigmoid function based on the B for that point. To get B, use the Newton-Raphson algorithm recursively a certain number of times, probably no more than ten.
Equation 4-4 on page 66, the Newton-Raphson algorithm itself, doesn't make sense to me. Based on my understanding of what X and W are, (x.transpose * w * x).inverse * x.transpose * w should be a (1+d x N) matrix, which doesn't match up with B, which is (1+d x 1). The only way that would work, then, is if e were a (N x 1) vector.
At the top of page 67, under the picture, though, Dr. Deng just says that e is a ratio, which doesn't make sense to me. Is e Euler's Constant, and it just so happens that that ratio is always 2.718:1, or is it something else? Either way, the explanation doesn't seem to suggest, to me, that it's a vector, which leaves me confused.
The use of pi' is also confusing to me. Equation 4-5, the derivative of the sigmoid function w.r.t. B, gives a constant multiplied by a vector, or a vector. From my understanding, though, pi' is just supposed to be a number, to be multiplied by w and form the diagonal of the weight algorithm W.
So, my two main questions here are, what is e on page 67 and is that the 1xN matrix I need, and how does pi' in equation 4-5 end up a number?
I realize that this is a difficult question to answer, so if there is a good answer then I will come back in a few days and give it a fifty point bounty. I would send an e-mail to Dr. Deng, but I haven't been able to find out what happened to him after 1997.
If anyone has any experience with this algorithm or knows of any other resources, any help would be much appreciated!
As far as I can see, this is just a version of Logistic regression in which the terms in the log-likelihood function have a multiplicative weight depending on their distance from the point you are trying to classify. I would start by getting familiar with an explanation of logistic regression, such as http://czep.net/stat/mlelr.pdf. The "e" you mention seems to be totally unconnected with Euler's constant - I think he is using e for error.
If you can call Java from Ruby, you may be able to make use of the logistic classifier in Weka described at http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/Logistic.html - this says "Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights." If nothing else, you could download it and look at its source code. If you do this, note that it is a fairly sophisticated approach - for instance, they check beforehand to see if all the points actually lie pretty much in some subspace of the input space, and project down a few dimensions if they do.
I've always been writing software to solve business problems. I came across about LIP while I was going through one of the SO posts. I googled it but I am unable to relate how I can use it to solve business problems. Appreciate if some one can help me understand in layman terms.
ILP can be used to solve essentially any problem involving making a bunch of decisions, each of which only has several possible outcomes, all known ahead of time, and in which the overall "quality" of any combination of choices can be described using a function that doesn't depend on "interactions" between choices. To see how it works, it's easiest to restrict further to variables that can only be 0 or 1 (the smallest useful range of integers). Now:
Each decision requiring a yes/no answer becomes a variable
The objective function should describe the thing we want to maximise (or minimise) as a weighted combination of these variables
You need to find a way to express each constraint (combination of choices that cannot be made at the same time) using one or more linear equality or inequality constraints
Example
For example, suppose you have 3 workers, Anne, Bill and Carl, and 3 jobs, Dusting, Typing and Packing. All of the people can do all of the jobs, but they each have different efficiency/ability levels at each job, so we want to find the best task for each of them to do to maximise overall efficiency. We want each person to perform exactly 1 job.
Variables
One way to set this problem up is with 9 variables, one for each combination of worker and job. The variable x_ad will get the value 1 if Anne should Dust in the optimal solution, and 0 otherwise; x_bp will get the value 1 if Bill should Pack in the optimal solution, and 0 otherwise; and so on.
Objective Function
The next thing to do is to formulate an objective function that we want to maximise or minimise. Suppose that based on Anne, Bill and Carl's most recent performance evaluations, we have a table of 9 numbers telling us how many minutes it takes each of them to perform each of the 3 jobs. In this case it makes sense to take the sum of all 9 variables, each multiplied by the time needed for that particular worker to perform that particular job, and to look to minimise this sum -- that is, to minimise the total time taken to get all the work done.
Constraints
The final step is to give constraints that enforce that (a) everyone does exactly 1 job and (b) every job is done by exactly 1 person. (Note that actually these steps can be done in any order.)
To make sure that Anne does exactly 1 job, we can add the constraint that x_ad + x_at + x_ap = 1. Similar constraints can be added for Bill and Carl.
To make sure that exactly 1 person Dusts, we can add the constraint that x_ad + x_bd + x_cd = 1. Similar constraints can be added for Typing and Packing.
Altogether there are 6 constraints. You can now supply this 9-variable, 6-constraint problem to an ILP solver and it will spit back out the values for the variables in one of the optimal solutions -- exactly 3 of them will be 1 and the rest will be 0. The 3 that are 1 tell you which people should be doing which job!
ILP is General
As it happens, this particular problem has a special structure that allows it to be solved more efficiently using a different algorithm. The advantage of using ILP is that variations on the problem can be easily incorporated: for example if there were actually 4 people and only 3 jobs, then we would need to relax the constraints so that each person does at most 1 job, instead of exactly 1 job. This can be expressed simply by changing the equals sign in each of the 1st 3 constraints into a less-than-or-equals sign.
First, read a linear programming example from Wikipedia
Now imagine the farmer producing pigs and chickens, or a factory producing toasters and vacuums - now the outputs (and possibly constraints) are integers, so those pretty graphs are going to go all crookedly step-wise. That's a business application that is easily represented as a linear programming problem.
I've used integer linear programming before to determine how to tile n identically proportioned images to maximize screen space used to display these images, and the formalism can represent covering problems like scheduling, but business applications of integer linear programming seem like the more natural applications of it.
SO user flolo says:
Use cases where I often met it: In digital circuit design you have objects to be placed/mapped onto certain parts of a chip (FPGA-Placing) - this can be done with ILP. Also in HW-SW codesign there often arise the partition problem: Which part of a program should still run on a CPU and which part should be accelerated on hardware. This can be also solved via ILP.
A sample ILP problem will looks something like:
maximize 37∙x1 + 45∙x2
where
x1,x2,... should be positive integers
...but, there is a set of constrains in the form
a1∙x1+b1∙x2 < k1
a2∙x1+b2∙x2 < k2
a3∙x1+b3∙x2 < k3
...
Now, a simpler articulation of Wikipedia's example:
A farmer has L m² land to be planted with either wheat or barley or a combination of the two.
The farmer has F grams of fertilizer, and P grams of insecticide.
Every m² of wheat requires F1 grams of fertilizer, and P1 grams of insecticide
Every m² of barley requires F2 grams of fertilizer, and P2 grams of insecticide
Now,
Let a1 denote the selling price of wheat per 1 m²
Let a2 denote the selling price of barley per 1 m²
Let x1 denote the area of land to be planted with wheat
Let x2 denote the area of land to be planted with barley
x1,x2 are positive integers (Assume we can plant in 1 m² resolution)
So,
the profit is a1∙x1 + a2∙x2 - we want to maximize it
Because the farmer has a limited area of land: x1+x2<=L
Because the farmer has a limited amount of fertilizer: F1∙x1+F2∙x2 < F
Because the farmer has a limited amount of insecticide: P1∙x1+P2∙x2 < P
a1,a2,L,F1,F2,F,P1,P2,P - are all constants (in our example: positive)
We are looking for positive integers x1,x2 that will maximize the expression stated, given the constrains stated.
Hope it's clear...
ILP "by itself" can directly model lots of stuff. If you search for LP examples you will probably find lots of famous textbook cases, such as the diet problem
Given a set of pills, each with a vitamin content and a daily vitamin
quota, find the cheapest cocktail that matches the quota.
Many such problems naturally have instances that require varialbe to be integers (perhaps you can't split pills in half)
The really interesting stuff though is that actually a big deal of combinatorial problems reduce to LP. One of my favourites is the assignment problem
Given a set of N workers, N tasks and an N by N matirx describing how
much each worker charges for the each task, determine what task to
give to each worker in order to minimize cost.
Most solution that naturally come up have exponential complexity but there is a polynomial solution using linear programming.
When it comes to ILP, ILP has the added benefit/difficulty of being NP-complete. This means that it can be used to model a very wide range of problems (boolean satisfiability is also very popular in this regard). Since there are many good and optimized ILP solvers out there it is often viable to translate an NP-complete problem into ILP instead of devising a custom algorithm of your own.
You can apply linear program easily everywhere you want to optimize and the target function is linear. You can make schedules (I mean big, like train companies, who need to optimize the utilization of the vehicles and tracks), productions (optimize win), almost everything. Sometimes it is tricky to formulate your problem as IP and/or sometimes you meet the problem that your solution is, that you have to produce e.g. 0.345 cars for optimum win. That is of course not possible, and so you constraint even more: Your variable for the number of cars must be integer. Even when it now sounds simpler (because you have infinite less choices for your variable), its actually harder. In this moment it gets NP-hard. Which actually means you can solve ANY problem from your computer with ILP, you just have to transform it.
For you I would recommend an intro into reading some basic (I)LP stuff. From my mind I dont know any good online site (but if you goolge you will find some), as book I can recommend Linear Programming from Chvatal. It has very good examples, and describes also real use cases.
The other answers here have excellent examples. Two of the gold standards in business of using integer programming and more generally operations research are
the journal Interfaces published by INFORMS (The Institute for Operations Research and the Management Sciences)
winners of the the Franz Edelman Award for Achievement in Operations Research and the Management Sciences
Interfaces publishes research that uses operations research applied to real-world problems, and the Edelman award is a highly competitive award for business use of operations research techniques.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
I read over the wikipedia article, but it seems to be beyond my comprehension. It says it's for optimization, but how is it different than any other method for optimizing things?
An answer that introduces me to linear programming so I can begin diving into some less beginner-accessible material would be most helpful.
The answers so far have given an algebraic definition of linear programming, and an operational definition. But there is also a geometric definition. A polytope is an n-dimensional generalization of a polygon (in two dimensions) or a polyhedron (in three dimensions). A convex polytope is a polytope which is also a convex set. By definition, linear programming is an optimization problem in which you want to maximize or minimize a linear function on a convex polytope.
For example: Suppose that you want to buy some combination of red sand and blue sand. Suppose also:
You can't buy a negative amount of either kind.
The depot only has 300 pounds of red sand and 400 pounds of blue sand.
Also your jeep has a weight limit of 500 pounds.
If you draw a picture in the plane of how much you can buy with these constraints, it's a convex pentagon. Then, whatever you want to optimize (say, the total amount of gold in the sand), you can know that an optimum (not necessarily the only optimum) is at one of the vertices of the polytope. In fact, there is a much stronger result: Even in high dimensions, any such linear programming problem can be solved in polynomial time, in the number of constraints, or putative sides of the polytope. Note that not every constraint corresponds to a side. If the constraint is an equality, it might reduce the dimension of the polytope. Or if the constraint is an inequality, it might be not create a side if it is already implied by all of the other constraints.
There are a lot of practical optimization problems that are linear programming. One of the first examples was the "diet problem": Given a menu of a bunch of kinds of food, what is the cheapest possible balanced diet? It's a linear programming problem because the cost is linear, and because all of the constraints (vitamins, calories, the assumption that you can't buy a negative amount of food, etc.) are linear.
But, linear programming is even more important for a theoretical reason. It is one of the most powerful polynomial-time algorithms for optimization or for any other purpose. As such, it is very important as a substitute for approximately solving other optimization problems, and as a subroutine for exactly solving them.
Yes, two generalizations are convex programming and integer programming. With some qualificiations, convex programming can work just as well as linear programming, provided that the objective (the thing to maximize) is linear. It turns out that convexity, not flat sides, is the main reason that linear programming has a good algorithm.
Integer programming, on the other hand, is usually hard. For instance, suppose in the example problem you have to buy the sand in fixed-size bags rather than in bulk; that is then integer programming. There is a theorem that it can be NP-hard. How hard it is in practice depends on how close it is to linear programming. There are some celebrated examples of integer programming problems in which, miraculously, all of the vertices of the linear program are integer points. Then you can solve the linear program and the solution will happen to be integral. One example of such a problem is the marriage problem, how to marry n men and n women to each other to maximize total happiness. (Or, n cities to n factories, n jobs to n applicants, n computers to n printers, etc.)
Linear programming is a topic of 'mathematical programming', which is also called 'mathematical optimization'. Linear programs differ from general mathematical programs in that for a Linear Program (LP) all constraint functions and the objective function are linear with respect to their variables.
A good place to start would be here if you want the original work by Dantzig, or if you want to get a textbook, I recommend this one. If you want to look up your own resources, start with looking up the Simplex method--it is a very common technique to solve these programs, or the less common but definitely polynomial time Ellipsoid method. Though I haven't read it all, looking it over quickly also suggests this PDF may be a good place to start. Make sure whatever you end up reading covers duality (and perhaps specifically the Farkas' lemma) as it's a central idea in most LP solvers.
The most natural extensions are either Integer programs (similar to LP's, but all variables must take on integer values--that is, no fractional components) or Convex programming (perhaps a more general extension). A good convex optimization text book is available in PDF form here.
As everyone else has said, Linear Programming is a way to solve optimization problems, in which the terms are linear.
It might help to understand what types of problems LP's solve
One example where I've used Linear Programming is building a restaurant schedule. In a restaurant you have skill sets:
Cooks
Servers
Dishwashers
Hosts
Bussers
Manager
etc
And you have employees, each with one or more skill sets. Each employee also has a specific availability. For example, Bob can't work Sunday mornings because he's a pastor at a local church. Employees also have an associated cost. Bob might be $10.50/hr while Suzy is $5.15/hr. Finally Employees might have minimum guaranteed hours. Since Bob has been an employee for 15 years, the boss says he'll always get at least 35 hours.
The restaurant itself has demands. For example it has 3 shifts: Morning, Afternoon, and Night, and each of these shifts has a set of staffing requirements: We need 1 cook, 1 server, 1 manager in the morning, 3 cooks, 2 servers, 2 hosts, 2 managers in the afternoon, and 4 cooks, 4 servers, 3 hosts, 2 managers, 2 bussers in the evening. Each shift will have a duration, and you can figure out the cost of each shift by multiplying the duration by the employee's hourly wage.
Finally we have state and federal laws, and some basic "business" rules: No employee can work more than 8-hours without going into overtime. No employee can be scheduled for less than 2 hours (because it would suck to make a 30 minute commute for a 2 hour shift), employees can't work two overlapping shifts, etc.
Now given all these requirements, give me a schedule that has meets all requirements, and produces the lowest labor cost.
This is an example of a linear programming optimization problem.
A linear program typically consists of:
An objective function, variables, variable bounds, and constraints.
Since we want to minimize cost, our objective function is going to involve shifts that employees work, and the associated costs (shift duration * wage).
The variables in this case, are the shifts that each employee can work.
The bounds on these variables integers between 0 and 1, because either an employee is working the shift (1), or the employee is not working the shift (0). This, by the way, is a special program, called a Binary Integer Program or BIP for short, because all the variables are integers (no fractional values) and all values are either 0 or 1.
The constraints are equality/inequality constraints based on the requirements above.
For example, if Bob and Suzy can both work as cooks in the morning, then Bob_Morning_Cook1_Shift + Suzy_Morning_Cook1_Shift = 1, with Bob_Morning_Cook_Shift = {0,1} and Suzy_Morning_Cook_Shift = {0,1} due to the bounds mentioned above. These three pieces of information specify that at most only a single employee can be assigned as the first morning cook.
So once you've defined all the constraints that model your problem, you can begin to solve the problem. If a solution can be found (and depending on the constraints the problem may be infeasible), it will give you the employee assignments that produce the lowest weekly labor cost.
Linear programming is an optimization technique that involves linear constraints and a linear objective function. The constraints are written to bound the problem space, while the objective function is something that you are attempting to minimize (or possibly maximize) that satisfies the constraints. The simplex algorithm is typically used to walk along edges of the constraint intersection to find the minimum (or maximum) value of the objective function satisfying the constraints.
When setting up an LP problem, it's important to make sure that the constraints properly bound the objective function. It's possible to define constraints which result in no possible solution (e.g. x > 1 and -x > 1). This is over constrained. It's also possible to under-constrain a problem (e.g. find min x such that x < 1).
One big difference (or at least, distinguishing feature) of linear programming is that the constraints are modeled as linear equations, i.e. they're all of the form c_1 x_1 + c_2 x_2.... The standard form section of the wikipedia article gives a pretty good overview of this.
Another difference/feature is that linear programming is seeking to maximize (or minimize) ONE function - you can't effectively do multi-objective optimization.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
What algorithm taught you the most about programming or a specific language feature?
We have all had those moments where all of a sudden we know, just know, we have learned an important lesson for the future based on finally understanding an algorithm written by a programmer a couple of steps up the evolutionary ladder. Whose ideas and code had the magic touch on you?
General algorithms:
Quicksort (and it's average complexity analysis), shows that randomizing your input can be a good thing!;
balanced trees (AVL trees for example), a neat way to balance search/insertion costs;
Dijkstra and Ford-Fulkerson algorithms on graphs (I like the fact that the second one has many applications);
the LZ* family of compression algorithms (LZW for example), data compression sounded kind of magic to me until I discovered it (a long time ago :) );
the FFT, ubiquitous (re-used in so many other algorithms);
the simplex algorithm, ubiquitous as well.
Numerical related:
Euclid's algorithm to compute the gcd of two integers: one of the first algorithms, simple and elegant, powerful, has lots of generalizations;
fast multiplication of integers (Cooley-Tukey for example);
Newton iterations to invert / find a root, a very powerful meta-algorithm.
Number theory-related:
AGM-related algorithms (examples): leads to very simple and elegant algorithms to compute pi (and much more!), though the theory is quite profound (Gauss introduced elliptic functions and modular forms from it, so you can say that it gave birth to algebraic geometry...);
the number field sieve (for integer factorization): very complicated, but quite a nice theoretical result (this also goes for the AKS algorithm, which proved that PRIMES is in P).
I also enjoyed studying quantum computing (Shor and Deutsch-Josza algorithms for example): this teaches you to think out of the box.
As you can see, I'm a bit biased towards maths-oriented algorithms :)
"To iterate is human, to recurse divine" - quoted in 1989 at college.
P.S. Posted by Woodgnome while waiting for invite to join
Floyd-Warshall all-pairs shortest paths algorithm
procedure FloydWarshall ()
for k := 1 to n
for i := 1 to n
for j := 1 to n
path[i][j] = min ( path[i][j], path[i][k]+path[k][j] );
Here's why it's cool: when you first learn about the shortest-path problem in your graph theory course, you probably start with Dijkstra's algorithm that solves single-source shortest path. It's quite complicated at first, but then you get over it, and you fully understood it.
Then the teacher says "Now we want to solve the same problem but for ALL sources". You think to yourself, "Oh god, this is going to be a much harder problem! It's going to be at least N times more complicated than Dijkstra's algorithm!!!".
Then the teacher gives you Floyd-Warshall. And your mind explodes. Then you start to tear up at how beautifully simple the algorithm is. It's just a triply-nested loop. It only uses a simple array for its data structure.
The most eye-opening part for me is the following realization: say you have a solution for problem A. Then you have a bigger "superproblem" B which contains problem A. The solution to problem B may in fact be simpler than the solution to problem A.
This one might sound trivial but it was a revelation for me at the time.
I was in my very first programming class(VB6) and the Prof had just taught us about random numbers and he gave the following instructions: "Create a virtual lottery machine. Imagine a glass ball full of 100 ping pong balls marked 0 to 99. Pick them randomly and display their number until they have all been selected, no duplicates."
Everyone else wrote their program like this: Pick a ball, put its number into an "already selected list" and then pick another ball. Check to see if its already selected, if so pick another ball, if not put its number on the "already selected list" etc....
Of course by the end they were making hundreds of comparisons to find the few balls that had not already been picked. It was like throwing the balls back into the jar after selecting them. My revelation was to throw balls away after picking.
I know this sounds mind-numbingly obvious but this was the moment that the "programming switch" got flipped in my head. This was the moment that programming went from trying to learn a strange foreign language to trying to figure out an enjoyable puzzle. And once I made that mental connection between programming and fun there was really no stopping me.
Huffman coding would be mine, I had originally made my own dumb version by minimizing the number of bits to encode text from 8 down to less, but had not thought about variable number of bits depending on frequency. Then I found the huffman coding described in an article in a magazine and it opened up lots of new possibilities.
Quicksort. It showed me that recursion can be powerful and useful.
Bresenham's line drawing algorithm got me interested in realtime graphics rendering. This can be used to render filled polygons, like triangles, for things like 3D model rendering.
Recursive Descent Parsing - I remember being very impressed how such simple code could do something so seemingly complex.
Quicksort in Haskell:
qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
Although I couldn'd write Haskell at the time, I did understand this code and with it recursion and the quicksort algorithm. It just made click and there it was...
The iterative algorithm for Fibonacci, because for me it nailed down the fact that the most elegant code (in this case, the recursive version) is not necessarily the most efficient.
To elaborate- The "fib(10) = fib(9) + fib(8)" approach means that fib(9) will be evaluated to fib(8) + fib(7). So evaluation of fib(8) (and therefor fib7, fib6) will all be evaluated twice.
The iterative method, (curr = prev1 + prev2 in a forloop) does not tree out this way, nor does it take as much memory since it's only 3 transient variables, instead of n frames in the recursion stack.
I tend to strive for simple, elegant code when I'm programming, but this is the algorithm that helped me realize that this isn't the end-all-be-all for writing good software, and that ultimately the end users don't care how your code looks.
For some reason I like the Schwartzian transform
#sorted = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, foo($_)] }
#unsorted;
Where foo($) represents a compute-intensive expression that takes $ (each item of the list in turn) and produces the corresponding value that is to be compared in its sake.
Minimax taught me that chess programs aren't smart, they can just think more moves ahead than you can.
I don't know if this qualifies as an algorithm, or just a classic hack. In either case, it helped to get me to start thinking outside the box.
Swap 2 integers without using an intermediate variable (in C++)
void InPlaceSwap (int& a, int &b) {
a ^= b;
b ^= a;
a ^= b;
}
Quicksort: Until I got to college, I had never questioned whether brute force Bubble Sort was the most efficient way to sort. It just seemed intuitively obvious. But being exposed to non-obvious solutions like Quicksort taught me to look past the obvious solutions to see if something better is available.
For me it's the weak-heapsort algorithm because it shows (1) how much a wise chosen data structure (and the algorithms working on it) can influence the performance and (2) that fascinating things can be discovered even in old, well-known things. (weak-heapsort is the best variant of all heap sorts, which was proven eight years later.)
This is a slow one :)
I learned lots about both C and computers in general by understanding Duffs Device and XOR swaps
EDIT:
#Jason Z, that's my XOR swap :) cool isn't it.
For some reason Bubble Sort has always stood out to me. Not because it's elegant or good just because it had/has a goofy name I suppose.
The iterative algorithm for Fibonacci, because for me it nailed down the fact that the most elegant code (in this case, the recursive version) is not necessarily the most efficient.
The iterative method, (curr = prev1 + prev2 in a forloop) does not tree out this way, nor does it take as much memory since it's only 3 transient variables, instead of n frames in the recursion stack.
You know that fibonacci has a closed form solution that allows direct computation of the result in a fixed number of steps, right? Namely, (phin - (1 - phi)n) / sqrt(5). It always strikes me as somewhat remarkable that this should yield an integer, but it does.
phi is the golden ratio, of course; (1 + sqrt(5)) / 2.
I don't have a favourite -- there are so many beautiful ones to pick from -- but one I've always found intriguing is the Bailey–Borwein–Plouffe (BBP) formula, which enables you to calculate an arbitrary digit of pi without knowledge about the preceding digits.
RSA introduced me to the world of modular arithmetic, which can be used to solve a surprising number of interesting problems!
Hasn't taught me much, but the Johnson–Trotter Algorithm never fails to blow my mind.
Binary decision diagrams, though formally not an algorithm but a datastructure, lead to elegant and minimal solutions for various sorts of (boolean) logic problems. They were invented and developped to minimise the gate count in chip-design, and can be viewed as one of the fundaments of the silicon revolution. The resulting algorithms are amazingly simple.
What they taught me:
a compact representation of any problem is important; small is beautiful
a small set of constraints/reductions applied recursively can be used to accomplish this
for problems with symmetries, tranformation to a canonical form should be the first step to consider
not every piece of literature is read. Knuth found out about BDD's several years after their invention/introduction. (and spent almost a year investigating them)
For me, the simple swap in Kelly & Pohl's A Book on C to demonstrate call-by-reference flipped me out when I first saw it. I looked at that, and pointers snapped into place. Verbatim. . .
void swap(int *p, int *q)
{
int temp;
temp = *p;
*p = *q;
*q = temp;
}
The Towers of Hanoi algorithm is one of the most beautiful algorithms. It shows how you can use recursion to solve a problem in a much more elegant fashion than the iterative method.
Alternatively, the recursion algorithm for Fibonacci series and calculating powers of a number demonstrate the reverse situation of recursive algorithm being used for the sake of recursion instead of providing good value.
An algorithm that generates a list of primes by comparing each number to the current list of primes, adding it if it's not found, and returning the list of primes at the end. Mind-bending in several ways, not the least of which being the idea of using the partially-completed output as the primary search criteria.
Storing two pointers in a single word for a doubly linked list tought me the lesson that you can do very bad things in C indeed (with which a conservative GC will have lots of trouble).
The most proud I've been of a solution was writing something very similar to the DisplayTag package. It taught me a lot about code design, maintainability, and reuse. I wrote it well before DisplayTag, and it was sunk into an NDA agreement, so I couldn't open source it, but I can still speak gushingly about that one in job interviews.
Map/Reduce. Two simple concepts that fit together to make a load of data-processing tasks easier to parallelize.
Oh... and it's only the basis of massively-parallel indexing:
http://labs.google.com/papers/mapreduce.html
Not my favorite, but the Miller Rabin Algorithm for testing primality showed me that being right almost all the time, is good enough almost all the time. (i.e. Don't mistrust a probabilistic algorithm just because it has a probability of being wrong.)
#Krishna Kumar
The bitwise solution is even more fun than the recursive solution.