Related
What is a collaborative algorithm? Is there a scientifically citable reference?
Details:
I found many articles about collaborative algorithms, but none (or other websites) with a definition.
I am actually looking for a term to describe distributed algorithms where each instance has all information at the beginning and can complete the whole task on its own, but the instances help each other whenever they have solved a sub-problem, so the other instances do not have to redo the work (hence "collaboration"). I picked up this terminology in A Collaborative Approach for Multi-Threaded SAT Solving. Do you think the term "collaborative algorithm" is suitable for this? If not, do you know of a better term?
No, there's no scientifically citable references.
All parallel/distributed programming is "collaborative" in a sense that several threads/nodes are collaborating on the same big task.
distributed algorithms where.. instances help each other whenever they have solved a sub-problem - even some web application clusters fit your description: individual cluster nodes "solve subproblems" and store the "solutions" in a distributed in-RAM storage (such as memcached or cassandra or many others) thus helping each other.
I think the term "collaborative algorithm" is not formal.
Actually the term "algorithm" itself is not really formal,
as far as I remember. I guess algorithm can be formalized as
"a program which runs on a Turing machine". I think I've seen
this definition somewhere.
So yes, I guess all in all the term you coined makes sense, but you
need to define it somehow yourself (either formally or informally).
Not sure what your background is but ... OK, in scientific papers
different authors sometimes use same terms/concepts for denoting
different things and sometimes they use different terms for
denoting the same thing.
Also, even though computer science papers are scientific not
all terms in them are formally defined. So I wouldn't draw
too many conclusions based on these papers unless I am familiar
to a decent extent with all of them or unless some of them
are considered really remarkable and widely accepted as
a de-facto standard in a particular sub-field or field.
Up until now I've mostly concentrated on how to properly design code, make it as readable as possible and as maintainable as possible. So I alway chose to learn about the higher level details of programming, such as class interactions, API design, etc.
Algorithms I never really found particularly interesting. As a result, even though I can come up with a good design for my programs, and even if I can come up with a solution to a given problem it rarely is the most efficient.
Is there a particular way of thinking about problems that helps you come up with an as efficient solution as possible, or is it simple a matter of practice and/or memorizing?
Also, what online resources can you recommend that teach you various efficient algorithms for different problems?
Data dominates. If you design your program around the right abstract data structures (ADTs), you often get a clean design, the algorithms follow quite naturally and when performance is lacking, you should be able to "plug in" more efficient ones.
A strong background in maths and logic helps here, as it allows you to visualize your program at a high level as the interaction between functions, sets, graphs, sequences, etc. You then decide whether the sets need to be ordered (balanced BST, O(lg n) operations) or not (hash tables, O(1) operations), what operations need to supported on sequences (vector-like or list-like), etc.
If you want to learn some algorithms, get a good book such as Cormen et al. and try to implement the main data structures:
binary search trees
generic binary search trees (that work on more than just int or strings)
hash tables
priority queues/heaps
dynamic arrays
Introduction To Algorithms is a great book to get you thinking about efficiency of different algorithms/data structures.
The authors of the book also teach an algorithms course on MIT . You can find most lectures here
I would say that in coming up with good algorithms (which is actually part of good design IMHO), you have to develop a way of thinking. This is best done by studying algorithm design. By study I don't mean just knowing all the common algorithms covered in a textbook, but actually understanding how and why they work, and being able to apply the basic idea contained in them to actual problems you are trying to solve.
I would suggest reading a good book on algorithms (my favourite is CLRS). For an online resource I would recommend the series of articles in the TopCoder Algorithm Tutorials.
I do not understand why you would mention practice and memorization in the same breath. Memorization won't help you at all (you probably already know this), but practice is essential. If you cannot apply what you learned, its not really learning. You can practice at various online programming contest/puzzle sites like SPOJ, Project Euler and PythonChallenge.
Recommendations:
First of all i recommend the book "Intro to Algorithms, Second Edition By corman", great book has most(if not all) of the algorithms you will need. (Some of the more important topics are sorting-algorithms, shortest paths, dynamic programing, many data structures like bst, hash maps, heaps).
another great way to learn algorithms is http://ace.delos.com/usacogate, great practice after the begining.
To your questions you will just get used to write good fast running code, after a little practice you just wouldnt want to write un-efficient code.
While I think #larsmans is correct inasmuch that understanding logic and maths is a fast way to understanding how to choose useful ADTs for solving a given problem, studying existing solutions may be more instructive for those of us who struggle with those topics. In particular, reviewing code of established software (OSS) that solves some similar problem as the one you're interested in.
I find a particularly good method for this method of study is reviewing unit tests of such a project. Apache Lucene, for example, has a source control repository containing numerous examples. While it doesn't reveal the underlying algorithms, it helps trace to particular functionality that solves a certain problem. This leads to an opportunity for studying its innards - i.e. an interesting algorithm. In Lucene's case inverted indices come to mind.
While this does not guarantee the algorithm you discover is the best, it's likely one that's received a lot scrutiny and probably comes from project with an active mailing that may answer your questions. So it's a good resource for finding a solution that is probably better than what most of us would come up with on our own.
First of all, is this only possible on algorithms which have no side effects?
Secondly, where could I learn about this process, any good books, articles, etc?
COQ is a proof assistant that produces correct ocaml output. It's pretty complicated though. I never got around to looking at it, but my coworker started and then stopped using it after two months. It was mostly because he wanted to get things done quicker, but if you need to verify an algorithm this might be a good idea.
Here is a course that uses COQ and talks about proving algorithms.
And here is a tutorial about writing academic papers in COQ.
It's generally a lot easier to verify/prove correctness when no side effects are involved, but it's not an absolute requirement.
You might want to look at some of the documentation for a formal specification language like Z. A formal specification isn't a proof itself, but is often the basis for one.
I think that verifying the correctness of an algorithm would be validating its conformance with a specification. There is a branch of theoretical Computer Science called Formal Methods which may be what you are looking for if you need to get as close to proof as you can. From wikipedia,
Formal Methods are a particular kind
of mathematically-based techniques for
the specification, development and
verification of software and hardware
systems
You will be able to find many learning resources and tools from the multitude of links on the linked Wikipedia page and from the Formal Methods wiki.
Usually proofs of correctness are very specific to the algorithm at hand.
However, there are several well known tricks that are used and re-used again. For example, with recursive algorithms you can use loop invariants.
Another common trick is reducing the original problem to a problem for which your algorithm's proof of correctness is easier to show, then either generalizing the easier problem or showing that the easier problem can be translated to a solution to the original problem. Here is a description.
If you have a particular algorithm in mind, you may do better in asking how to construct a proof for that algorithm rather than a general answer.
Buy these books: http://www.amazon.com/Science-Programming-Monographs-Computer/dp/0387964800
The Gries book, Scientific Programming is great stuff. Patient, thorough, complete.
Logic in Computer Science, by Huth and Ryan, gives a reasonably readable overview of modern systems for verifying systems. Once upon a time people talked about proving programs correct - with programming languages which may or may not have side effects. The impression I get from this book and elsewhere is that real applications are different - for instance proving that a protocol is correct, or that a chip's floating point unit can divide correctly, or that a lock-free routine for manipulating linked lists is correct.
ACM Computing Surveys Vol 41 Issue 4 (October 2009) is a special issue on software verification. It looks like you can get to at least one of the papers without an ACM account by searching for "Formal Methods: Practice and Experience".
The tool Frama-C, for which Elazar suggests a demo video in the comments, gives you a specification language, ACSL, for writing function contracts and various analyzers for verifying that a C function satisfies its contract and safety properties such as the absence of run-time errors.
An extended tutorial, ACSL by example, shows examples of actual C algorithms being specified and verified, and separates the side-effect-free functions from the effectful ones (the side-effect-free ones are considered easier and come first in the tutorial). This document is also interesting in that it was not written by the designers of the tools it describe, so it gives a fresher and more didactic look at these techniques.
If you are familiar with LISP then you should definitely check out ACL2: http://www.cs.utexas.edu/~moore/acl2/acl2-doc.html
Dijkstra's Discipline of Programming and his EWDs lay the foundation for formal verification as a science in programming. A simpler work is Wirth's Systematic Programming, which begins with the simple approach to using verification. Wirth uses pre-ISO Pascal for the language; Dijkstra uses an Algol-68-like formalism called Guarded (GCL). Formal verification has matured since Dijkstra and Hoare, but these older texts may still be a good starting point.
PVS tool developed by Stanford guys is a specification and verification system. I worked on it and found it very useful for Theoram Proving.
WRT (1), you will probably have to create a model of the algorithm in a way that "captures" the side-effects of the algorithm in a program variable intended to model such state-based side-effects.
In another question, I asked something similar but I ended up just posting my algorithm there and invalidating several answers. I re-ask it here:
If I "invented" an algorithm, what's the best way for me to figure out if it's already been published about/patented?
You would need to do some searching. Starting with Google search, generically, will often be sufficient to reassure you that your algorithm is not novel. If that is not conclusive, then you need to search harder, perhaps looking at searching various patent sites (Google, USPTO, other places too). If you still don't find anything, then maybe your algorithm is novel.
Next questions: is it worth it to you to try and patent it, or get someone else to patent it for you (a company, for example)? Indeed, can you patent it or does your employer already own it? This will depend in part on how likely it is that everyone else will want to use the same algorithm. The chances are, they won't. If you patent it, they will ignore it until the patent expires.
If you do find a way to afford getting the patent filed - and issued (which is not automatic just because you filed) - then you face enforcing your patent. Will you be able to identify and prosecute those who abuse your patent? If not, was it worth chasing it? Maybe, maybe not; but probably not.
Finally, note that you cannot actually patent a pure algorithm. You would have to reduce it to practice. That isn't as hard as it seems, but just be aware that pure mathematical algorithms are inherently non-patentable.
In summary:
You will probably find someone else already thought of it.
If you decide to patent it because it is novel, you need money.
You need money to file for the patent.
You need money to pursue those who abuse your patent.
You would probably be better off just publishing it.
Most often you basically just have to do back ground research in the given area. This is why when academics do research projects they start of by learning about the history (back ground) of the area all the way up to the current methods or theories being used. It also helps to ask someone who knows the area and has worked in it for many years.
Well, if it's in a textbook like your algorithm seems to have been (Dijkstra), then it definitely already exists in the public domain and cannot be patented. How you use the algorithm in your application as a whole might be, but most abstract ideas or implementations thereof (such as "finding the shortest path between two nodes") cannot be patented.
Or, you could waste a bunch of money and submit a patent and see what happens :)
In all seriousness though, you might start by searching for existing patents, or read up on some articles like this one to get a better feel for the patent process.
Tracking down every algorithm for a particular problem would be quite daunting. A better process might be to track down the best solutions known for the problem and compare them with yours.
I would start with Wikipedia. I know people say "don't use wiki for research", but it's pretty good at computer science (all those geeks contributing), and it will tell you pretty quickly what the best widely-known algorithms are. If you've got something strictly better than the algorithms you can find in Wikipedia, then it might be worth looking further. If Wikipedia's got something strictly better than your algorithm, then you've invented a curiosity at best and probably won't get rich or famous from it.
Next, check the references at the bottom; they may lead you to papers (which will have more references that you can follow), or to academics' websites (which might have links). Also go to Citeseer and search for key words.
Unfortunately, there's no real replacement for having some basic knowledge. If you've invented (for instance) a graph-theoretic algorithm, but you don't know the language of graph-theory, then you'll struggle to find it because you won't know where to start looking. You might profitably spend your time reading an algorithms textbook -- that will give you an overview of good algorithms and how to speak about them.
If an algorithm or method can not be found right away (wikipedia/google), i find it rewarding to scan academic/engineering websites (web of science, ieee explore, acm etc.) for 'review' papers. If recent, they can give a solid overview over the field (e.g. graph search) mentioning books, papers and conferences. After that one can focus the search on particular methods.
When faced with a problem in software I usually see a solution right away. Of course, what I see is usually somewhat off, and I always need to sit down and design (admittedly, I usually don't design enough), but I get a certain intuition right away.
My problem is I don't get that same intuition when it comes to advanced algorithms. I feel much more up to the task of building another Facebook then building another Google search, or a Music Genom project. It's probably because I've been building software for quite some time, but I have little experience with composing algorithms.
I would like the community's advice on what to read and what projects to undertake to be better at composing algorithms.
(This question has nothing to do with Algorithmic composition. Well, almost nothing)
+1 To whoever said experience is the best teacher.
There are several online portals which have a lot of programming problems, that you can submit your own solutions to, and get an automated pass/fail indication.
http://www.spoj.pl/
http://uva.onlinejudge.org/
http://www.topcoder.com/tc
http://code.google.com/codejam/contests.html
http://projecteuler.net/
https://codeforces.com
https://leetcode.com
The USACO training site is the training program that all USA computing olympiad participants go through. It goes step by step, introducing more and more complex algorithms as you go.
You might find it helpful to perform algorithms physically. For example, when you're studying sorting algorithms, practice doing each one with a deck of cards. That will activate different parts of your brain than reading or programming alone will.
Steve Yegge referred to "The Algorithm Design Manual" in one of his rants. I haven't seen it myself, but it sounds like it's just the ticket from his description.
My absolute favorite for this kind of interview preparation is Steven Skiena's The Algorithm Design Manual. More than any other book it helped me understand just how astonishingly commonplace (and important) graph problems are – they should be part of every working programmer's toolkit. The book also covers basic data structures and sorting algorithms, which is a nice bonus. But the gold mine is the second half of the book, which is a sort of encyclopedia of 1-pagers on zillions of useful problems and various ways to solve them, without too much detail. Almost every 1-pager has a simple picture, making it easy to remember. This is a great way to learn how to identify hundreds of problem types.
problem domain
First you must understand the problem domain. An elegant solution to the wrong problem is no good, nor is an inefficient solution to the right problem in most cases. Solution quality, in other words, is often relative. A simple scheduling problem that has a deterministic solution that takes ten minutes to run may be fine if schedules are realculated once per week, but if schedules change several times a day then a genetic algorithm solution that converges in a few seconds may be required.
decomposition and mapping
Second, decompose the problem into sub-problems and known/unknown elements that correspond to elements of the solution. Sometimes this is obvious, e.g. to count widgets you need a way of identifying widgets, an incrementable counter, and a way of storing the count. Sometimes it is not so obvious. Sometimes you have to decompose the problem, the domain, and possible solutions at the same time and try several different mappings between them to find one that leads to the correct results [this is the general method].
model
Model the solution, in your head at least, and walk through it to see if it works correctly. Adjust as necessary (See decomposition and mapping, above).
composition/interfaces
Many times you can find elements of the problem and elements of the solution that map to each other and produce partial results that are useful. This composition and interface construction provides the kernal of the solution, and also serves to reduce the scope of the problem remaining. So then you just loop back to the top with a smaller initial problem, and go through it again.
experience
Experience is the best teacher, of course, but reading about different kinds of problems and solutions will also be helpful. Studying some of the well-known algorithms and their applications is likewise very helpful, e.g. Dijkstra, Bresenham, Unification, and of course, graph theory.
I am not sure intuition can be cultivated, but I think I know what you are asking. The more problems you solve, the more information and experience you have at your disposal for future problems. So, I say just practice. Practice programming real world applications and you run into plenty of problems. Sometimes, solving puzzles can be very educational as well.
I try to find physical analogues when I'm looking at a complex problem.